Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

How to work with managed folders with EKS compute

Solved!
Peter_R_Knight
Level 2
How to work with managed folders with EKS compute

I am running a python recipe in DSS version 10.0.2

I want to read and write to managed folders which I currently do usign the command:

config_path = dataiku.Folder("config files").get_path()

but I get the following error:

[10:41:24] [INFO] [dku.utils]  - *************** Recipe code failed **************
[10:41:24] [INFO] [dku.utils]  - Begin Python stack
[10:41:24] [INFO] [dku.utils]  - Traceback (most recent call last):
[10:41:24] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module>
[10:41:24] [INFO] [dku.utils]  -     exec(fd.read())
[10:41:24] [INFO] [dku.utils]  -   File "<string>", line 16, in <module>
[10:41:24] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/core/managed_folder.py", line 151, in get_path
[10:41:24] [INFO] [dku.utils]  -     self._ensure_and_check_direct_access()
[10:41:24] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/core/managed_folder.py", line 132, in _ensure_and_check_direct_access
[10:41:24] [INFO] [dku.utils]  -     raise Exception('Python process is running remotely, direct access to folder is not possible')
[10:41:24] [INFO] [dku.utils]  - Exception: Python process is running remotely, direct access to folder is not possible

 

Is there a way around this you can recommend?

Thanks

 

0 Kudos
1 Solution
AlexT
Dataiker

Hi @Peter_R_Knight ,

Since you are running in containerized execution you will need to use the get_download_stream() 

As explained here: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#local-vs-non-local 

Since get_path() will only work for a local folder (i.e. a folder hosted on the filesystem, when the job is not running in a container),

folder_handle = dataiku.Folder("FOLDER_NAME")
with folder_handle.get_download_stream("/path/to/file/in/folder") as f:
    my_file = f.read()

Let me know if that helps! 

View solution in original post

0 Kudos
4 Replies
AlexT
Dataiker

Hi @Peter_R_Knight ,

Since you are running in containerized execution you will need to use the get_download_stream() 

As explained here: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#local-vs-non-local 

Since get_path() will only work for a local folder (i.e. a folder hosted on the filesystem, when the job is not running in a container),

folder_handle = dataiku.Folder("FOLDER_NAME")
with folder_handle.get_download_stream("/path/to/file/in/folder") as f:
    my_file = f.read()

Let me know if that helps! 

0 Kudos
Peter_R_Knight
Level 2
Author

Many thanks for the pointers. 

The issue I'm going to face is that I'm calling GitHub code that needs to also be able to run locally and so I will end up having to litter the GitHub code with if dataiku_flag then read/write this way, else do it another way.  I'm also calling other libraries that I believe can only save to a file path.

I wondered if there might be a way to copy input folders to somewhere accessible to EKS (perhaps S3), and write output to a temp location on S3, then at the end of the code copy it back to the managed folder. 

0 Kudos
AlexT
Dataiker

@Peter_R_Knight ,

You can create the folder in DSS to be stored in S3 and interact with the remote managed folder in the same manner with get_download_stream and upload_stream() or upload_data() 

Reference doc is available here: https://doc.dataiku.com/dss/latest/python-api/managed_folders.html 

You can use local storage on container or things like StreamIO, BytesIO if needed and then upload either the files or file-like objects to the S3 backed managed folder.

Let us know if you have questions. 

0 Kudos
Scobbyy2k3
Level 3

i am having similar problems. How do i use a glob command with this?

0 Kudos