We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Enabling parquet format in Dataiku DSS

Solved!
Ankur30
Level 2
Level 2
Enabling parquet format in Dataiku DSS

Hi

Currently when we write into Dataiku file system we only csv and avro format.

How can I enable parque format in Dataiku DSS running on linux platform on EC2 instance.

I need steps for that. Also we don't have any HDFS connection setup as well.

Regards,

Ankur.

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi Ankur,

To support parquet files on non-Hadoop install You will need to install hadoop integration with the standalone libraries for parquet to work. Please review: https://doc.dataiku.com/dss/latest/connecting/formats/parquet.html#applicability to see the restrictions related to parquet. 

The steps to install: 

https://doc.dataiku.com/dss/latest/containers/setup-k8s.html#optional-setup-spark 

Download the standalone libs from( if you are on a different version change the version in the URL) : https://downloads.dataiku.com/public/studio/9.0.5/dataiku-dss-hadoop-standalone-libs-generic-hadoop3... 

./bin/dssadmin install-hadoop-integration -standaloneArchive /PATH/TO/dataiku-dss-hadoop3-standalone-libs-generic...tar.gz

 

Let me know if you have any issues. 

 

View solution in original post

0 Kudos
2 Replies
AlexT
Dataiker
Dataiker

Hi Ankur,

To support parquet files on non-Hadoop install You will need to install hadoop integration with the standalone libraries for parquet to work. Please review: https://doc.dataiku.com/dss/latest/connecting/formats/parquet.html#applicability to see the restrictions related to parquet. 

The steps to install: 

https://doc.dataiku.com/dss/latest/containers/setup-k8s.html#optional-setup-spark 

Download the standalone libs from( if you are on a different version change the version in the URL) : https://downloads.dataiku.com/public/studio/9.0.5/dataiku-dss-hadoop-standalone-libs-generic-hadoop3... 

./bin/dssadmin install-hadoop-integration -standaloneArchive /PATH/TO/dataiku-dss-hadoop3-standalone-libs-generic...tar.gz

 

Let me know if you have any issues. 

 

View solution in original post

0 Kudos
Ankur30
Level 2
Level 2
Author

Thanks @AlexT  for prompt response. I will use the above steps you mentioned and then Accept it as solution once I was able to configure the parque format.

Thank you.

 

Regards,

Ankur,

0 Kudos

Labels

?
A banner prompting to get Dataiku DSS