Python process is running remotely, direct access to folder is not possible
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os
import glob
# Read recipe inputs
ctdna_output = dataiku.Folder("aMqrRJSr")
ctdna_output_info = ctdna_output.get_info()
#get list of files
files = glob.glob(os.path.join(ctdna_output.get_path(),'R2810_1624_*.csv')) *** (where i am having the problem)
****Please help with a solution
Answers
-
Hi,
From the post's title, it seems that the code is running outside of DSS (probably in a pod). As a result of this, you don't have access to the DSS filesystem (the same implies to the managed folders hosted in another location (S3, HDFS, Azure Blob, …)), so you will need to use get_download_stream/upload_stream to read/write from/to the managed folder. Please refer to the below example:
folder_handle = dataiku.Folder("jWoN2f4k") paths = folder_handle.list_paths_in_partition() for path in paths: with folder_handle.get_download_stream(path) as f: output_df = pd.read_csv(f) print(output_df.shape) # do something with dataframe
Best,
Vitaliy
-
Hi Vitaliy,
I do have access to the DSS.
If i run that code on the DSS computation, it work, but once i change my computation to kubernetes, it gives errors.
-
Hi, without knowing what the error is, we can't say much. Can you add a job diag? If you can't add it here, I would suggest opening a support ticket providing the job diag of the failed job (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#guidelines-for-submitting-a-support-ticke).
Best,
Vitaliy