You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Error Uploading parquet file from AWS S3

Solved!
Carl
Level 3
Error Uploading parquet file from AWS S3

Hi, 

I need to upload a parquet file from my AWS S3 bucket but I'm getting this error message :

 Oops: an unexpected error occurred

org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration

Question : Is it possible with Dataiku upload parquet files ? Which file types are allowed to upload from AWS S3 ?

BTW, I'm able to upload CSV files from my AWS S3 bucket

0 Kudos
1 Solution
SantoshK
Dataiker
Dataiker
Hello Carl,
 
Parquet is a format that needs Hadoop libs to work, and this DSS instance doesn't find them. The below error indicates that hadoop-integration is broken or not done. 
 
org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration
 
Please make sure to run ./bin/dssadmin install-hadoop-integration script after DSS has been installed/upgraded or Hadoop cluster has been upgraded.
 
More details are available in our doc:

View solution in original post

0 Kudos
2 Replies
SantoshK
Dataiker
Dataiker
Hello Carl,
 
Parquet is a format that needs Hadoop libs to work, and this DSS instance doesn't find them. The below error indicates that hadoop-integration is broken or not done. 
 
org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration
 
Please make sure to run ./bin/dssadmin install-hadoop-integration script after DSS has been installed/upgraded or Hadoop cluster has been upgraded.
 
More details are available in our doc:
0 Kudos
AlexT
Dataiker
Dataiker

Hi @Carl ,

To use Parquet files in DSS you must first run the Hadoop integration. 

https://doc.dataiku.com/dss/latest/hadoop/installation.html#setting-up-dss-hadoop-integration

If you don't have Hadoop installed on the DSS machine you can use standaloneArchive available here:

https://downloads.dataiku.com/public/studio/10.0.7/dataiku-dss-hadoop-standalone-libs-generic-hadoop...

First download dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gz and then stop dss and run : 

DATADIR/dssadmin install-hadoop-integration -standaloneArchive dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gz

Start DSS and they you should be able to upload parquet files. 

 

Let me know if that helps. 

0 Kudos