Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Hive recipe to parition in Hive parquet

Level 2
Hive recipe to parition in Hive parquet

Hi guys,

I'm working on Hive ORC table. I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)... to be efficient.

The output is parquet with Hive view.

(I've modified the ouput with the settings / partitioning stuff mentionned in videos)

But I cannot find a way to sync this with the Hive recipe...

Thank you again for your help.

Best regards, Tate

update, with a prepare recipe I can partition (but only one partition... I have to create a scenario and iterate to create all my partitions??)

+ how do you limite your parquet file size? (optimized block of 256Mo or 128Mo). 


0 Kudos
2 Replies

Hi @Tate_fr,

I'm not sure I understand your problem. What do you mean by "But I cannot find a way to sync this with the Hive recipe..." What is what you can't sync?


0 Kudos

Tate, I think you are looking to redispatch partitioning: with a sync recipe you can check the box "Redispatch partitioning according to input columns" after enabling partitions in the target dataset. Redispatching partitions will traverse the input dataset column and write a partition on each available as the sync job runs.

Adam Gugliciello
Senior Solution Engineer
+ 1 (201) 398 - 6036

0 Kudos