Add missing dates

Santi
Level 1
Add missing dates

I have a dataset of package orders in a warehouse and I have missing dates because there were 0 orders that day, but I want to have all the dates and the count of orders of 0. Does anyone know how to add missing dates?

0 Kudos
1 Reply
Turribeach

That is called data densification and there are many ways to do that in Dataiku depending on what technologies you have available. My preference would be to create the date range using pandas in a output only Python recipe:

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
All_Dates_df = pd.DataFrame(pd.date_range(start='2020-01-01', end='2024-12-31'), columns=['pd_date'])

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Write recipe outputs
All_Dates = dataiku.Dataset("All_Dates")
All_Dates.write_with_schema(All_Dates_df)

Then use a Join recipe to join back this new All_Dates dataset to your dataset using the date column as the joining column and select the join as Outer join to get all matches plus the missing dates from All_Dates. Finally in the Post-join computed columns section of the Join recipe create a new computed column to "merge" the original_date and the All_Dates column into a one column:

if(isNull(original_date), pd_date, original_date)

which you can call "combined_date". Result: you now have a dataset with a combined_date column which densified.