Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi guys,
I'm working on Hive ORC table. I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)... to be efficient.
The output is parquet with Hive view.
(I've modified the ouput with the settings / partitioning stuff mentionned in videos)
But I cannot find a way to sync this with the Hive recipe...
Thank you again for your help.
Best regards, Tate
update, with a prepare recipe I can partition (but only one partition... I have to create a scenario and iterate to create all my partitions??)
+ how do you limite your parquet file size? (optimized block of 256Mo or 128Mo).
Hi @Tate_fr,
I'm not sure I understand your problem. What do you mean by "But I cannot find a way to sync this with the Hive recipe..." What is what you can't sync?
Cheers
Tate, I think you are looking to redispatch partitioning: with a sync recipe you can check the box "Redispatch partitioning according to input columns" after enabling partitions in the target dataset. Redispatching partitions will traverse the input dataset column and write a partition on each available as the sync job runs.