How to Handle Missing Data for Seasonal Analysis in Dataiku?
raihanhd
Registered Posts: 3 ✭✭
Hi Dataiku Community,
I’m working on a dataset containing daily commodity prices over multiple years (2021-2024). However, there are significant gaps in the data, which are affecting my ability to analyze trends.
Here are the details:
- Daily Data
- The dataset records daily prices for various commodities, but many days are missing.
- Example :
Commodity | Date| Price (Rp)
Example A | 2021-03-17 | 13,000
Example A | 2021-12-26 | 12,440
Example A | 2023-02-14 | 12,480
Example A | 2024-04-18 | 17,030- Some commodities have no data at all for certain years (e.g., Example B has no data for 2021-2022).
- Yearly Average Data
- I calculated yearly averages for each commodity to summarize the data.
- Example:
Commodity | Year | Average Price (Rp) | Days Counted
Example A | 2021 | NULL | 365
Example A | 2022 | NULL | 365
Example A | 2023 | 13,500 | 365
Example A | 2024 | NULL | 365- Even with yearly averages, some commodities still have null values due to completely missing raw data for certain years.
My Questions:
- What is the best method to handle missing seasonal data like this in Dataiku?
- Are there any built-in features or plugins in Dataiku that can help fill missing data based on seasonal trends or interpolation?
- If interpolation or seasonal averages aren’t feasible due to limited data points, how can I effectively flag and handle these missing values for further analysis?
I’d really appreciate your insights and recommendations. Thanks in advance! 😊