Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

NoClassDefFoundError when reading a parquet file

Level 1
NoClassDefFoundError when reading a parquet file

I have setup an HDFS connection to access a Google Cloud Storage bucket on which I have parquet files.

After adding GoogleHadoopFileSystem to the hadoop configuration I can access the bucket and files.

However when I create a new dataset and I select a parquet file (including a standard sample found at https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet), I have this error:

 Oops: an unexpected error occurred

parquet/hadoop/ParquetInputFormat

Please see our options for getting help

HTTP code: 500, type: java.lang.NoClassDefFoundError

The complete error returned by the server is:

 

 

 

{"errorType":"java.lang.NoClassDefFoundError","message":"parquet/hadoop/ParquetInputFormat","detailedMessage":"parquet/hadoop/ParquetInputFormat","detailedMessageHTML":"\u003cspan\u003e\u003cspan class\u003d\"err-msg\"\u003eparquet/hadoop/ParquetInputFormat\u003c/span\u003e\u003c/span\u003e","stackTraceStr":"java.lang.NoClassDefFoundError: parquet/hadoop/ParquetInputFormat\n\tat com.dataiku.dip.input.formats.parquet.ParquetFormatExtractor$1.run(ParquetFormatExtractor.java:114)\n\tat com.dataiku.dip.input.formats.parquet.ParquetFormatExtractor$1.run(ParquetFormatExtractor.java:106)\n\tat java.base/java.security.AccessController.doPrivileged(Native Method)\n\tat java.base/javax.security.auth.Subject.doAs(Subject.java:423)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)\n\tat com.dataiku.dip.util.HadoopUtils.fixedUpDoAs(HadoopUtils.java:36)\n\tat com.dataiku.dip.input.formats.parquet.ParquetFormatExtractor.run(ParquetFormatExtractor.java:106)\n\tat com.dataiku.dip.datasets.fs.FileFormatDatasetTestHandler.gatherSampleRecords(FileFormatDatasetTestHandler.java:462)\n\tat com.dataiku.dip.datasets.fs.FileFormatDatasetTestHandler.detectFormats(FileFormatDatasetTestHandler.java:175)\n\tat com.dataiku.dip.server.datasets.DatasetsTestController$TestAndDetectFormatFutureThread.compute(DatasetsTestController.java:364)\n\tat com.dataiku.dip.server.datasets.DatasetsTestController$TestAndDetectFormatFutureThread.compute(DatasetsTestController.java:327)\n\tat com.dataiku.dip.futures.SimpleFutureThread.execute(SimpleFutureThread.java:36)\n\tat com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)\n","stackTrace":[{"file":"ParquetFormatExtractor.java","line":114,"function":"com.dataiku.dip.input.formats.parquet.ParquetFormatExtractor$1.run"},{"file":"ParquetFormatExtractor.java","line":106,"function":"com.dataiku.dip.input.formats.parquet.ParquetFormatExtractor$1.run"},{"file":"AccessController.java","line":-2,"function":"java.security.AccessController.doPrivileged"},{"file":"Subject.java","line":423,"function":"javax.security.auth.Subject.doAs"},{"file":"UserGroupInformation.java","line":1893,"function":"org.apache.hadoop.security.UserGroupInformation.doAs"},{"file":"HadoopUtils.java","line":36,"function":"com.dataiku.dip.util.HadoopUtils.fixedUpDoAs"},{"file":"ParquetFormatExtractor.java","line":106,"function":"com.dataiku.dip.input.formats.parquet.ParquetFormatExtractor.run"},{"file":"FileFormatDatasetTestHandler.java","line":462,"function":"com.dataiku.dip.datasets.fs.FileFormatDatasetTestHandler.gatherSampleRecords"},{"file":"FileFormatDatasetTestHandler.java","line":175,"function":"com.dataiku.dip.datasets.fs.FileFormatDatasetTestHandler.detectFormats"},{"file":"DatasetsTestController.java","line":364,"function":"com.dataiku.dip.server.datasets.DatasetsTestController$TestAndDetectFormatFutureThread.compute"},{"file":"DatasetsTestController.java","line":327,"function":"com.dataiku.dip.server.datasets.DatasetsTestController$TestAndDetectFormatFutureThread.compute"},{"file":"SimpleFutureThread.java","line":36,"function":"com.dataiku.dip.futures.SimpleFutureThread.execute"},{"file":"FutureThreadBase.java","line":88,"function":"com.dataiku.dip.futures.FutureThreadBase.run"}]}

 

 

 

Using DSS 8.0.2 with hadoop-2.10.0 and spark-2.4.5-bin-without-hadoop.

0 Kudos
1 Reply

Hi @phildav . What "engine" are you using to create the new dataset? Wherever the calculation is taking place, apparently it doesn't have installed the libraries to read parquet files.

0 Kudos
Labels (2)