Add missing dates

Options
Santi
Santi Registered Posts: 5

I have a dataset of package orders in a warehouse and I have missing dates because there were 0 orders that day, but I want to have all the dates and the count of orders of 0. Does anyone know how to add missing dates?

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,710 Neuron
    Options

    That is called data densification and there are many ways to do that in Dataiku depending on what technologies you have available. My preference would be to create the date range using pandas in a output only Python recipe:

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE# -*- coding: utf-8 -*-import dataikuimport pandas as pd, numpy as npfrom dataiku import pandasutils as pdu# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODEAll_Dates_df = pd.DataFrame(pd.date_range(start='2020-01-01', end='2024-12-31'), columns=['pd_date'])# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE# Write recipe outputsAll_Dates = dataiku.Dataset("All_Dates")All_Dates.write_with_schema(All_Dates_df)

    Then use a Join recipe to join back this new All_Dates dataset to your dataset using the date column as the joining column and select the join as Outer join to get all matches plus the missing dates from All_Dates. Finally in the Post-join computed columns section of the Join recipe create a new computed column to "merge" the original_date and the All_Dates column into a one column:

    if(isNull(original_date), pd_date, original_date)

    which you can call "combined_date". Result: you now have a dataset with a combined_date column which densified.

Setup Info
    Tags
      Help me…