Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Conversion to Parquet fails in Hadoop HDFS

Solved!
Benoni
Level 3
Conversion to Parquet fails in Hadoop HDFS

$ hadoop version Hadoop 3.1.2



Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a



Compiled by sunilg on 2019-01-29T01:39Z



Compiled with protoc 2.5.0 From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9



This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.1.2.jar



 



I receive this error shortly after the recipe starts:



parquet/io/api/RecordConsumer, caused by: ClassNotFoundException: parquet.io.api.RecordConsumer



 



Looks like Java cant find the RecordConsumer.class or .jar file. Any ideas how to fix this?



 



---SOLVED---



1. Locate your env-hadoop.sh in DATA_DIR



2. Sudo nano env-hadoop.sh



3. find line "export DKU_HADOOP_CP="



4. add 




:$DKUINSTALLDIR/lib/ivy/parquet-run/*


5. Restart DSS

0 Kudos
1 Solution
Clément_Stenac
Dataiker
Dataiker

Hi,



Dataiku does not support "home made" Hadoop distributions.



You may have some success by editing the "bin/env-hadoop.sh" file, locating the "DKU_HIVE_CP" line, and adding at the end (within the quotes):




:$DKUINSTALLDIR/lib/ivy/parquet-run/*


Then restart DSS

View solution in original post

4 Replies
Clément_Stenac
Dataiker
Dataiker

Hi,



Dataiku does not support "home made" Hadoop distributions.



You may have some success by editing the "bin/env-hadoop.sh" file, locating the "DKU_HIVE_CP" line, and adding at the end (within the quotes):




:$DKUINSTALLDIR/lib/ivy/parquet-run/*


Then restart DSS

Benoni
Level 3
Author
Thanks for the answer however i can't find the "DKU_HIVE_CP" line you mention. You can find my hadoop-env.sh here:

https://paste.ubuntu.com/p/jgcSMTGbSd/
0 Kudos
Benoni
Level 3
Author
Figured you're talking about the DATADIR/bin later on. Ignore my question. Thanks for the help.
0 Kudos
Benoni
Level 3
Author
Tested it and it works. Thanks. Btw i added it to DKU_HADOOP_CP not DKU_HIVE_CP.
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku