maintaining time series dataset - adding data file every month

vivekkumar
Level 1
maintaining time series dataset - adding data file every month

We have a use case where we need to maintain monthly data in Hive table for reporting 

- Every month data file is sourced manually

- Data file has a column to save date (month end date)

- Requirement is to store monthly data in Hive table

- Hive table should be partitioned by date (month end date)

- Its like stacking new data into Hive table 

There is also requirement to occasionally override monthly data if an monthly override data file arrives 

Please suggest a suitable solution  

0 Kudos
1 Reply
AlexT
Dataiker

Hi @vivekkumar,
Sounds like you could just use redispath.

Simply add your new data files to the input dataset, add a sync recipe with "redispatch" mode, and output dataset will be partitioned by month. 

Re-run the sync recipe every month after adding your manually sourced files to the input dataset list of files, you can either use a folder or edit or add an existing file to an existing dataset.

https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html

Kind Regards,

0 Kudos