HDFS - Force Parquet as default settings for recipe output

Charly
Charly Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 13 Partner

Greetings !

I'm currently on a platform with Dataiku 11.3.1 and writing datasets on HDFS. IT requires all dataset to be written in Parquet, but the default setting is on CSV (Hive) and it can generate errors.

Is there a way to configure the connection to force the default settings to be Parquet ?

Best regards,

Tagged:

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    Answer ✓

    Hi @Charly
    ,
    You can configure the instance level preferred format from the Administration -> "Prefered storage formats" and place PARQUET_HIVE as the first option

    Screenshot 2024-05-21 at 1.14.04 PM.png
    This can also be controlled at project level by overriding the global Datasets creation settings.

Answers

  • Charly
    Charly Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 13 Partner

    Thanks @AlexT
    , I was searchning in the individual connection.

    Have a nice day !

Setup Info
    Tags
      Help me…