Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

How to work with managed folders with EKS compute

Solved!
Peter_R_Knight
Level 2
How to work with managed folders with EKS compute

I am running a python recipe in DSS version 10.0.2

I want to read and write to managed folders which I currently do usign the command:

config_path = dataiku.Folder("config files").get_path()

but I get the following error:

[10:41:24] [INFO] [dku.utils]  - *************** Recipe code failed **************
[10:41:24] [INFO] [dku.utils]  - Begin Python stack
[10:41:24] [INFO] [dku.utils]  - Traceback (most recent call last):
[10:41:24] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module>
[10:41:24] [INFO] [dku.utils]  -     exec(fd.read())
[10:41:24] [INFO] [dku.utils]  -   File "<string>", line 16, in <module>
[10:41:24] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/core/managed_folder.py", line 151, in get_path
[10:41:24] [INFO] [dku.utils]  -     self._ensure_and_check_direct_access()
[10:41:24] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/core/managed_folder.py", line 132, in _ensure_and_check_direct_access
[10:41:24] [INFO] [dku.utils]  -     raise Exception('Python process is running remotely, direct access to folder is not possible')
[10:41:24] [INFO] [dku.utils]  - Exception: Python process is running remotely, direct access to folder is not possible

 

Is there a way around this you can recommend?

Thanks

 

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi @Peter_R_Knight ,

Since you are running in containerized execution you will need to use the get_download_stream() 

As explained here: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#local-vs-non-local 

Since get_path() will only work for a local folder (i.e. a folder hosted on the filesystem, when the job is not running in a container),

folder_handle = dataiku.Folder("FOLDER_NAME")
with folder_handle.get_download_stream("/path/to/file/in/folder") as f:
    my_file = f.read()

Let me know if that helps! 

View solution in original post

0 Kudos
3 Replies
AlexT
Dataiker
Dataiker

Hi @Peter_R_Knight ,

Since you are running in containerized execution you will need to use the get_download_stream() 

As explained here: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#local-vs-non-local 

Since get_path() will only work for a local folder (i.e. a folder hosted on the filesystem, when the job is not running in a container),

folder_handle = dataiku.Folder("FOLDER_NAME")
with folder_handle.get_download_stream("/path/to/file/in/folder") as f:
    my_file = f.read()

Let me know if that helps! 

0 Kudos
Peter_R_Knight
Level 2
Author

Many thanks for the pointers. 

The issue I'm going to face is that I'm calling GitHub code that needs to also be able to run locally and so I will end up having to litter the GitHub code with if dataiku_flag then read/write this way, else do it another way.  I'm also calling other libraries that I believe can only save to a file path.

I wondered if there might be a way to copy input folders to somewhere accessible to EKS (perhaps S3), and write output to a temp location on S3, then at the end of the code copy it back to the managed folder. 

0 Kudos
AlexT
Dataiker
Dataiker

@Peter_R_Knight ,

You can create the folder in DSS to be stored in S3 and interact with the remote managed folder in the same manner with get_download_stream and upload_stream() or upload_data() 

Reference doc is available here: https://doc.dataiku.com/dss/latest/python-api/managed_folders.html 

You can use local storage on container or things like StreamIO, BytesIO if needed and then upload either the files or file-like objects to the S3 backed managed folder.

Let us know if you have questions. 

0 Kudos