HDFS - Force Parquet as default settings for recipe output
Charly
Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 13 Partner
Greetings !
I'm currently on a platform with Dataiku 11.3.1 and writing datasets on HDFS. IT requires all dataset to be written in Parquet, but the default setting is on CSV (Hive) and it can generate errors.
Is there a way to configure the connection to force the default settings to be Parquet ?
Best regards,
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @Charly
,
You can configure the instance level preferred format from the Administration -> "Prefered storage formats" and place PARQUET_HIVE as the first option
This can also be controlled at project level by overriding the global Datasets creation settings.