Hive recipe to parition in Hive parquet

Options
Tate_fr
Tate_fr Registered Posts: 8 ✭✭✭✭

Hi guys,

I'm working on Hive ORC table. I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)... to be efficient.

The output is parquet with Hive view.

(I've modified the ouput with the settings / partitioning stuff mentionned in videos)

But I cannot find a way to sync this with the Hive recipe...

Thank you again for your help.

Best regards, Tate

update, with a prepare recipe I can partition (but only one partition... I have to create a scenario and iterate to create all my partitions??)

+ how do you limite your parquet file size? (optimized block of 256Mo or 128Mo).

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @Tate_fr
    ,

    I'm not sure I understand your problem. What do you mean by "But I cannot find a way to sync this with the Hive recipe..." What is what you can't sync?

    Cheers

  • eschatus
    eschatus Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 Dataiker
    Options

    Tate, I think you are looking to redispatch partitioning: with a sync recipe you can check the box "Redispatch partitioning according to input columns" after enabling partitions in the target dataset. Redispatching partitions will traverse the input dataset column and write a partition on each available as the sync job runs.

Setup Info
    Tags
      Help me…