Hive recipe to parition in Hive parquet

Tate_fr · ‎03-24-2021

Hi guys,

I'm working on Hive ORC table. I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)... to be efficient.

The output is parquet with Hive view.

(I've modified the ouput with the settings / partitioning stuff mentionned in videos)

But I cannot find a way to sync this with the Hive recipe...

Thank you again for your help.

Best regards, Tate

update, with a prepare recipe I can partition (but only one partition... I have to create a scenario and iterate to create all my partitions??)

+ how do you limite your parquet file size? (optimized block of 256Mo or 128Mo).

Ignacio_Toledo · ‎03-31-2021

Hi @Tate_fr,

I'm not sure I understand your problem. What do you mean by "But I cannot find a way to sync this with the Hive recipe..." What is what you can't sync?

Cheers

eschatus · ‎05-25-2021

Tate, I think you are looking to redispatch partitioning: with a sync recipe you can check the box "Redispatch partitioning according to input columns" after enabling partitions in the target dataset. Redispatching partitions will traverse the input dataset column and write a partition on each available as the sync job runs.

Adam Gugliciello
Senior Solution Engineer
+ 1 (201) 398 - 6036
adam.gugliciello@dataiku.com
www.dataiku.com

Sign up to take part

Hive recipe to parition in Hive parquet

Hive recipe to parition in Hive parquet