How to Handle Missing Data for Seasonal Analysis in Dataiku?

raihanhd Registered Posts: 3 ✭✭
edited January 6 in General Discussion

Hi Dataiku Community,

I’m working on a dataset containing daily commodity prices over multiple years (2021-2024). However, there are significant gaps in the data, which are affecting my ability to analyze trends.

Here are the details:

  1. Daily Data
    • The dataset records daily prices for various commodities, but many days are missing.
    • Example :
      Commodity | Date| Price (Rp)
      Example A | 2021-03-17 | 13,000
      Example A | 2021-12-26 | 12,440
      Example A | 2023-02-14 | 12,480
      Example A | 2024-04-18 | 17,030
      • Some commodities have no data at all for certain years (e.g., Example B has no data for 2021-2022).
  2. Yearly Average Data
    • I calculated yearly averages for each commodity to summarize the data.
    • Example:
      Commodity | Year | Average Price (Rp) | Days Counted
      Example A | 2021 | NULL | 365
      Example A | 2022 | NULL | 365
      Example A | 2023 | 13,500 | 365
      Example A | 2024 | NULL | 365
      • Even with yearly averages, some commodities still have null values due to completely missing raw data for certain years.

My Questions:

  1. What is the best method to handle missing seasonal data like this in Dataiku?
  2. Are there any built-in features or plugins in Dataiku that can help fill missing data based on seasonal trends or interpolation?
  3. If interpolation or seasonal averages aren’t feasible due to limited data points, how can I effectively flag and handle these missing values for further analysis?

I’d really appreciate your insights and recommendations. Thanks in advance! 😊

Setup Info
      Help me…