Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We have a use case where we need to maintain monthly data in Hive table for reporting
- Every month data file is sourced manually
- Data file has a column to save date (month end date)
- Requirement is to store monthly data in Hive table
- Hive table should be partitioned by date (month end date)
- Its like stacking new data into Hive table
There is also requirement to occasionally override monthly data if an monthly override data file arrives
Please suggest a suitable solution
Hi @vivekkumar,
Sounds like you could just use redispath.
Simply add your new data files to the input dataset, add a sync recipe with "redispatch" mode, and output dataset will be partitioned by month.
Re-run the sync recipe every month after adding your manually sourced files to the input dataset list of files, you can either use a folder or edit or add an existing file to an existing dataset.
https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html
Kind Regards,