Python process is running remotely, direct access to folder is not possible

Scobbyy2k3
Level 3
Python process is running remotely, direct access to folder is not possible

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os
import glob

# Read recipe inputs
ctdna_output = dataiku.Folder("aMqrRJSr")
ctdna_output_info = ctdna_output.get_info()

#get list of files
files = glob.glob(os.path.join(ctdna_output.get_path(),'R2810_1624_*.csv')) *** (where i am having the problem)

****Please help with a solution

0 Kudos
3 Replies
VitaliyD
Dataiker

Hi,

From the post's title, it seems that the code is running outside of DSS (probably in a pod). As a result of this, you don't have access to the DSS filesystem (the same implies to the managed folders hosted in another location (S3, HDFS, Azure Blob, โ€ฆ)), so you will need to use get_download_stream/upload_stream to read/write from/to the managed folder. Please refer to the below example:

folder_handle = dataiku.Folder("jWoN2f4k")
paths = folder_handle.list_paths_in_partition()
for path in paths:
    with folder_handle.get_download_stream(path) as f:
        output_df = pd.read_csv(f)
        print(output_df.shape)
        # do something with dataframe

Best,

Vitaliy

0 Kudos
Scobbyy2k3
Level 3
Author

Hi Vitaliy,

 

I do have access to the DSS.

If i run that code on the DSS computation, it work, but once i change my computation to kubernetes, it gives errors.

0 Kudos
VitaliyD
Dataiker

Hi, without knowing what the error is, we can't say much. Can you add a job diag? If you can't add it here, I would suggest opening a support ticket providing the job diag of the failed job (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#guidelines-for-submitting-...).

Best,

Vitaliy

0 Kudos