Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Python process is running remotely, direct access to folder is not possible

Scobbyy2k3
Level 3
Python process is running remotely, direct access to folder is not possible

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os
import glob

# Read recipe inputs
ctdna_output = dataiku.Folder("aMqrRJSr")
ctdna_output_info = ctdna_output.get_info()

#get list of files
files = glob.glob(os.path.join(ctdna_output.get_path(),'R2810_1624_*.csv')) *** (where i am having the problem)

****Please help with a solution

0 Kudos
3 Replies
VitaliyD
Dataiker

Hi,

From the post's title, it seems that the code is running outside of DSS (probably in a pod). As a result of this, you don't have access to the DSS filesystem (the same implies to the managed folders hosted in another location (S3, HDFS, Azure Blob, …)), so you will need to use get_download_stream/upload_stream to read/write from/to the managed folder. Please refer to the below example:

folder_handle = dataiku.Folder("jWoN2f4k")
paths = folder_handle.list_paths_in_partition()
for path in paths:
    with folder_handle.get_download_stream(path) as f:
        output_df = pd.read_csv(f)
        print(output_df.shape)
        # do something with dataframe

Best,

Vitaliy

0 Kudos
Scobbyy2k3
Level 3
Author

Hi Vitaliy,

 

I do have access to the DSS.

If i run that code on the DSS computation, it work, but once i change my computation to kubernetes, it gives errors.

0 Kudos
VitaliyD
Dataiker

Hi, without knowing what the error is, we can't say much. Can you add a job diag? If you can't add it here, I would suggest opening a support ticket providing the job diag of the failed job (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#guidelines-for-submitting-...).

Best,

Vitaliy

0 Kudos