Error Uploading parquet file from AWS S3
Hi,
I need to upload a parquet file from my AWS S3 bucket but I'm getting this error message :
Oops: an unexpected error occurred
org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration
Question : Is it possible with Dataiku upload parquet files ? Which file types are allowed to upload from AWS S3 ?
BTW, I'm able to upload CSV files from my AWS S3 bucket
Best Answer
-
Hello Carl,Parquet is a format that needs Hadoop libs to work, and this DSS instance doesn't find them. The below error indicates that hadoop-integration is broken or not done.
org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration
Please make sure to run ./bin/dssadmin install-hadoop-integration script after DSS has been installed/upgraded or Hadoop cluster has been upgraded.More details are available in our doc:
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @Carl
,To use Parquet files in DSS you must first run the Hadoop integration.
https://doc.dataiku.com/dss/latest/hadoop/installation.html#setting-up-dss-hadoop-integration
If you don't have Hadoop installed on the DSS machine you can use standaloneArchive available here:
https://downloads.dataiku.com/public/studio/10.0.7/dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gzFirst download dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gz and then stop dss and run :
DATADIR/dssadmin install-hadoop-integration -standaloneArchive dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gz
Start DSS and they you should be able to upload parquet files.
Let me know if that helps.