Hive recipe to parition in Hive parquet

Tate_fr
Level 2
Hive recipe to parition in Hive parquet

Hi guys,

I'm working on Hive ORC table. I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)... to be efficient.

The output is parquet with Hive view.

(I've modified the ouput with the settings / partitioning stuff mentionned in videos)

But I cannot find a way to sync this with the Hive recipe...

Thank you again for your help.

Best regards, Tate

update, with a prepare recipe I can partition (but only one partition... I have to create a scenario and iterate to create all my partitions??)

+ how do you limite your parquet file size? (optimized block of 256Mo or 128Mo). 

 

0 Kudos
2 Replies
Ignacio_Toledo

Hi @Tate_fr,

I'm not sure I understand your problem. What do you mean by "But I cannot find a way to sync this with the Hive recipe..." What is what you can't sync?

Cheers

0 Kudos
eschatus
Dataiker

Tate, I think you are looking to redispatch partitioning: with a sync recipe you can check the box "Redispatch partitioning according to input columns" after enabling partitions in the target dataset. Redispatching partitions will traverse the input dataset column and write a partition on each available as the sync job runs.


Adam Gugliciello
Senior Solution Engineer
+ 1 (201) 398 - 6036
adam.gugliciello@dataiku.com
www.dataiku.com

0 Kudos