Parquet datasets no longer viewable
I have parquet datasets as part of my flow, but I can longer view them. I either get a "root path does not exist" error or "org/apache/hadoop/conf/Configuration, caused by: ClassNotFoundException: org.apache.hadoop.conf.Configuration" when trying to view my datasets. Nothing has changed except for updating our dataiku instance from version 10 to 11. Is there something that needs to be installed when upgrading to version 11 in order to allow the use of parquet format? If so, could you provide instructions how to go about that?
Thanks in advance!
Operating system used: Linux
Operating system used: Linux
Best Answer
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @ryanraasch
,
I think you are running into the issue described here: https://doc.dataiku.com/dss/latest/troubleshooting/problems/no-class-def-found.html
Note that the hadoop integration must be re-run after each upgrade, so it likely simply hasn't been run post-upgrade. Re-running your hadoop integration should resolve the issue for you.
Thanks,
Sarina