Error in python process: direct access to folder is not possible.
Hello,
I was going through tutorial - shared code and observed error as below, how to fix this? thanks a lot.
Job failed: Error in python process: At line 11: <class 'Exception'>: Python process is running remotely, direct access to folder is not possible
import dataiku from dataiku import pandasutils as pdu import pandas as pd from to_xlsx import dataframe_to_xlsx # Example: load a DSS dataset as a Pandas dataframe transactions_filtered = dataiku.Dataset("ecommerce_transactions_filtered") transactions_filtered_df = transactions_filtered.get_dataframe() #dataframe_to_xlsx(input dataframe, folder where output file will be written, name of the output file) dataframe_to_xlsx(transactions_filtered_df,'output_test', 'Transactions')
to_xlsx code in library:
import dataiku import pandas as pd import openpyxl import io import pickle def dataframe_to_xlsx(input_dataframe, folder_name, output_file_name): folder = dataiku.Folder(folder_name) folder_infos = folder.get_info() if folder_infos["type"] == "S3": pickle_bytes = io.BytesIO() pickle.dump(input_dataframe, pickle_bytes) with folder.get_writer("input_dataframe.p") as w: w.write(pickle_bytes.getvalue()) else: folder_path = folder.get_path() folder_path = folder_path + '/' + output_file_name + '.xlsx' writer = pd.ExcelWriter(folder_path, engine='openpyxl') input_dataframe.to_excel(writer, index=False, encoding='utf-8') writer.save()
Operating system used: Win10
Answers
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
Hi,
The "Python process is running remotely, direct access to folder is not possible" error occurs because the job that executes your code recipe runs containerized, and it therefore does not have direct filesystem access to the DSS machine.
Whenever possible, it is advisable to use the get_download_stream() method to read a file from a folder, rather than get_path(). While get_path() will only work for a local folder (i.e. a folder hosted on the filesystem, when the job is not running in a container), get_download_stream() works regardless of how the job is executed or where the folder contents are stored. This is described further here in our product documentation.
In the shared code tutorial we use get_path to simplify matters, since we're making the assumption both the code env and the managed folder are local.