I'm working on Hive ORC table. I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)... to be efficient.
The output is parquet with Hive view.
(I've modified the ouput with the settings / partitioning stuff mentionned in videos)
But I cannot find a way to sync this with the Hive recipe...
Thank you again for your help.
Best regards, Tate
update, with a prepare recipe I can partition (but only one partition... I have to create a scenario and iterate to create all my partitions??)
+ how do you limite your parquet file size? (optimized block of 256Mo or 128Mo).
Tate, I think you are looking to redispatch partitioning: with a sync recipe you can check the box "Redispatch partitioning according to input columns" after enabling partitions in the target dataset. Redispatching partitions will traverse the input dataset column and write a partition on each available as the sync job runs.