Python process is running remotely, direct access to folder is not possible

Scobbyy2k3 · November 2022

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os
import glob

# Read recipe inputs
ctdna_output = dataiku.Folder("aMqrRJSr")
ctdna_output_info = ctdna_output.get_info()

#get list of files
files = glob.glob(os.path.join(ctdna_output.get_path(),'R2810_1624_*.csv')) *** (where i am having the problem)

****Please help with a solution

Vitaliy · November 2022

Hi,

From the post's title, it seems that the code is running outside of DSS (probably in a pod). As a result of this, you don't have access to the DSS filesystem (the same implies to the managed folders hosted in another location (S3, HDFS, Azure Blob, …)), so you will need to use get_download_stream/upload_stream to read/write from/to the managed folder. Please refer to the below example:

folder_handle = dataiku.Folder("jWoN2f4k")
paths = folder_handle.list_paths_in_partition()
for path in paths:
    with folder_handle.get_download_stream(path) as f:
        output_df = pd.read_csv(f)
        print(output_df.shape)
        # do something with dataframe

Best,

Vitaliy

Scobbyy2k3 · November 2022

Hi Vitaliy,

I do have access to the DSS.

If i run that code on the DSS computation, it work, but once i change my computation to kubernetes, it gives errors.

Vitaliy · November 2022

Hi, without knowing what the error is, we can't say much. Can you add a job diag? If you can't add it here, I would suggest opening a support ticket providing the job diag of the failed job (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#guidelines-for-submitting-a-support-ticke).

Best,

Vitaliy

Python process is running remotely, direct access to folder is not possible

Answers

Categories

Setup Info

Tags