Error in python process: direct access to folder is not possible.

xjiang
xjiang Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 4
edited July 2024 in General Discussion

Hello,

I was going through tutorial - shared code and observed error as below, how to fix this? thanks a lot.

Job failed: Error in python process: At line 11: <class 'Exception'>: Python process is running remotely, direct access to folder is not possible

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
from to_xlsx import dataframe_to_xlsx

# Example: load a DSS dataset as a Pandas dataframe
transactions_filtered = dataiku.Dataset("ecommerce_transactions_filtered")
transactions_filtered_df = transactions_filtered.get_dataframe()

#dataframe_to_xlsx(input dataframe, folder where output file will be written, name of the output file)
dataframe_to_xlsx(transactions_filtered_df,'output_test', 'Transactions')

to_xlsx code in library:

import dataiku
import pandas as pd
import openpyxl
import io
import pickle

def dataframe_to_xlsx(input_dataframe, folder_name, output_file_name):
    folder = dataiku.Folder(folder_name)
    folder_infos = folder.get_info()
    if folder_infos["type"] == "S3":
        pickle_bytes = io.BytesIO()
        pickle.dump(input_dataframe, pickle_bytes)
        with folder.get_writer("input_dataframe.p") as w:
            w.write(pickle_bytes.getvalue())
    else:
        folder_path = folder.get_path()
        folder_path = folder_path + '/' + output_file_name + '.xlsx'
        writer = pd.ExcelWriter(folder_path, engine='openpyxl')
        input_dataframe.to_excel(writer, index=False, encoding='utf-8')
        
        writer.save()


Operating system used: Win10

Tagged:

Answers

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker

    Hi,

    The "Python process is running remotely, direct access to folder is not possible" error occurs because the job that executes your code recipe runs containerized, and it therefore does not have direct filesystem access to the DSS machine.

    Whenever possible, it is advisable to use the get_download_stream() method to read a file from a folder, rather than get_path(). While get_path() will only work for a local folder (i.e. a folder hosted on the filesystem, when the job is not running in a container), get_download_stream() works regardless of how the job is executed or where the folder contents are stored. This is described further here in our product documentation.

    In the shared code tutorial we use get_path to simplify matters, since we're making the assumption both the code env and the managed folder are local.

Setup Info
    Tags
      Help me…