Error Uploading parquet file from AWS S3

Solved!
Carl
Level 3
Error Uploading parquet file from AWS S3

Hi, 

I need to upload a parquet file from my AWS S3 bucket but I'm getting this error message :

 Oops: an unexpected error occurred

org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration

Question : Is it possible with Dataiku upload parquet files ? Which file types are allowed to upload from AWS S3 ?

BTW, I'm able to upload CSV files from my AWS S3 bucket

0 Kudos
1 Solution
SantoshK
Dataiker
Hello Carl,
 
Parquet is a format that needs Hadoop libs to work, and this DSS instance doesn't find them. The below error indicates that hadoop-integration is broken or not done. 
 
org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration
 
Please make sure to run ./bin/dssadmin install-hadoop-integration script after DSS has been installed/upgraded or Hadoop cluster has been upgraded.
 
More details are available in our doc:

View solution in original post

0 Kudos
2 Replies
SantoshK
Dataiker
Hello Carl,
 
Parquet is a format that needs Hadoop libs to work, and this DSS instance doesn't find them. The below error indicates that hadoop-integration is broken or not done. 
 
org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration
 
Please make sure to run ./bin/dssadmin install-hadoop-integration script after DSS has been installed/upgraded or Hadoop cluster has been upgraded.
 
More details are available in our doc:
0 Kudos
AlexT
Dataiker

Hi @Carl ,

To use Parquet files in DSS you must first run the Hadoop integration. 

https://doc.dataiku.com/dss/latest/hadoop/installation.html#setting-up-dss-hadoop-integration

If you don't have Hadoop installed on the DSS machine you can use standaloneArchive available here:

https://downloads.dataiku.com/public/studio/10.0.7/dataiku-dss-hadoop-standalone-libs-generic-hadoop...

First download dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gz and then stop dss and run : 

DATADIR/dssadmin install-hadoop-integration -standaloneArchive dataiku-dss-hadoop-standalone-libs-generic-hadoop3-10.0.7.tar.gz

Start DSS and they you should be able to upload parquet files. 

 

Let me know if that helps. 

0 Kudos