Conversion to Parquet fails in Hadoop HDFS

Solved!
Benoni
Level 3
Conversion to Parquet fails in Hadoop HDFS

$ hadoop version Hadoop 3.1.2



Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a



Compiled by sunilg on 2019-01-29T01:39Z



Compiled with protoc 2.5.0 From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9



This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.1.2.jar



 



I receive this error shortly after the recipe starts:



parquet/io/api/RecordConsumer, caused by: ClassNotFoundException: parquet.io.api.RecordConsumer



 



Looks like Java cant find the RecordConsumer.class or .jar file. Any ideas how to fix this?



 



---SOLVED---



1. Locate your env-hadoop.sh in DATA_DIR



2. Sudo nano env-hadoop.sh



3. find line "export DKU_HADOOP_CP="



4. add 




:$DKUINSTALLDIR/lib/ivy/parquet-run/*


5. Restart DSS

0 Kudos
1 Solution
Clément_Stenac
Dataiker

Hi,



Dataiku does not support "home made" Hadoop distributions.



You may have some success by editing the "bin/env-hadoop.sh" file, locating the "DKU_HIVE_CP" line, and adding at the end (within the quotes):




:$DKUINSTALLDIR/lib/ivy/parquet-run/*


Then restart DSS

View solution in original post

4 Replies
Clément_Stenac
Dataiker

Hi,



Dataiku does not support "home made" Hadoop distributions.



You may have some success by editing the "bin/env-hadoop.sh" file, locating the "DKU_HIVE_CP" line, and adding at the end (within the quotes):




:$DKUINSTALLDIR/lib/ivy/parquet-run/*


Then restart DSS

Benoni
Level 3
Author
Thanks for the answer however i can't find the "DKU_HIVE_CP" line you mention. You can find my hadoop-env.sh here:

https://paste.ubuntu.com/p/jgcSMTGbSd/
0 Kudos
Benoni
Level 3
Author
Figured you're talking about the DATADIR/bin later on. Ignore my question. Thanks for the help.
0 Kudos
Benoni
Level 3
Author
Tested it and it works. Thanks. Btw i added it to DKU_HADOOP_CP not DKU_HIVE_CP.
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku