Reading a file from HDFS Managed Folder

navraj28
Level 1
Reading a file from HDFS Managed Folder
I have a Managed Folder pointing to a HDFS folder. HDFS Connection is "hdfs_root". I am able to browse files from Settings page. I am using DSS Version 7.0.2. In my Recipe I have some pretty simple code: mf7 = dataiku.Folder("w1NS8KLT") paths = mf7.list_paths_in_partition() print(paths) with mf7.get_download_stream(paths[0]) as f: data = f.read() The code is listing the contents of the folder. However I am seeing this error in the Stacktrace: [17:31:01] [INFO] [dku.fsproviders.hdfs] - Enumerating HDFS Filesystem from root : /user/dataiku/dss_managed_datasets/SEMANTICPIPELINE/Test1 [17:31:01] [WARN] [dku.job.activity] - Failed to fill written size for SEMANTICPIPELINE.Test1 com.dataiku.dip.exceptions.DataStoreIOException: Root path of the dataset does not exist at com.dataiku.dip.datasets.fs.HDFSDatasetHandler.enumerateFilesystem(HDFSDatasetHandler.java:307) at com.dataiku.dip.datasets.fs.AbstractFSDatasetHandler.enumerateFilesystem(AbstractFSDatasetHandler.java:464) at com.dataiku.dip.datasets.fs.AbstractFSDatasetHandler.getRequiredFiles(AbstractFSDatasetHandler.java:253) at com.dataiku.dip.datasets.fs.AbstractFSDatasetHandler.getRequiredFiles(AbstractFSDatasetHandler.java:285) at com.dataiku.dip.dataflow.JobActivity.fillTargetWrittenSizes(JobActivity.java:349) at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:78) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:380)
0 Kudos
1 Reply
Clément_Stenac
Dataiker

Hi,

In order to avoid duplicating work for our teams, we kindly ask that you please refrain from posting the same question both to this community and to our technical customer support at the same time.

If you wish for the whole community to be able to see your question and reply, then please post to this community. If you prefer the matter to remain private, possibly because your post contains sensitive information, then please use other channels, but not both at the same time.

As you noted in the support ticket, this is not an error but a simple warning, because you had a dataset as output of your recipe, but did not write any data, so DSS simply warns that since there is no data written, it can't compute how much data was written.

 

 

0 Kudos