Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am running a python recipe in DSS version 10.0.2
I want to read and write to managed folders which I currently do usign the command:
config_path = dataiku.Folder("config files").get_path()
but I get the following error:
[10:41:24] [INFO] [dku.utils] - *************** Recipe code failed ************** [10:41:24] [INFO] [dku.utils] - Begin Python stack [10:41:24] [INFO] [dku.utils] - Traceback (most recent call last): [10:41:24] [INFO] [dku.utils] - File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module> [10:41:24] [INFO] [dku.utils] - exec(fd.read()) [10:41:24] [INFO] [dku.utils] - File "<string>", line 16, in <module> [10:41:24] [INFO] [dku.utils] - File "/opt/dataiku/python/dataiku/core/managed_folder.py", line 151, in get_path [10:41:24] [INFO] [dku.utils] - self._ensure_and_check_direct_access() [10:41:24] [INFO] [dku.utils] - File "/opt/dataiku/python/dataiku/core/managed_folder.py", line 132, in _ensure_and_check_direct_access [10:41:24] [INFO] [dku.utils] - raise Exception('Python process is running remotely, direct access to folder is not possible') [10:41:24] [INFO] [dku.utils] - Exception: Python process is running remotely, direct access to folder is not possible
Is there a way around this you can recommend?
Thanks
Hi @Peter_R_Knight ,
Since you are running in containerized execution you will need to use the get_download_stream()
As explained here: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#local-vs-non-local
folder_handle = dataiku.Folder("FOLDER_NAME")
with folder_handle.get_download_stream("/path/to/file/in/folder") as f:
my_file = f.read()
Let me know if that helps!
Hi @Peter_R_Knight ,
Since you are running in containerized execution you will need to use the get_download_stream()
As explained here: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#local-vs-non-local
folder_handle = dataiku.Folder("FOLDER_NAME")
with folder_handle.get_download_stream("/path/to/file/in/folder") as f:
my_file = f.read()
Let me know if that helps!
Many thanks for the pointers.
The issue I'm going to face is that I'm calling GitHub code that needs to also be able to run locally and so I will end up having to litter the GitHub code with if dataiku_flag then read/write this way, else do it another way. I'm also calling other libraries that I believe can only save to a file path.
I wondered if there might be a way to copy input folders to somewhere accessible to EKS (perhaps S3), and write output to a temp location on S3, then at the end of the code copy it back to the managed folder.
You can create the folder in DSS to be stored in S3 and interact with the remote managed folder in the same manner with get_download_stream and upload_stream() or upload_data()
Reference doc is available here: https://doc.dataiku.com/dss/latest/python-api/managed_folders.html
You can use local storage on container or things like StreamIO, BytesIO if needed and then upload either the files or file-like objects to the S3 backed managed folder.
Let us know if you have questions.
i am having similar problems. How do i use a glob command with this?