Managed Folder not accessible when running recipe on Containerized Execution
Hi,
I've create a custom recipe with a Dataset and Managed Folder as input and a Dataset as output. The recipe runs well on DSS but it doesn't work when its configured to run on a Containerized Execution. These are the steps i took and also the error i received after running the receipt.
Setting up
- Add the plugin (with the recipe)
- In the plugin summary page, created a new code environment with following parameters.
- Create new: Managed by DSS
- Python: Python36
- Build images for: Selected Container: mycontainer
- Build new environment without errors
Using it
- In my project, went to settings to configure project containerized execution to use 'mycontainer' and code env selection to use my python36 env and checked 'Prevent override by recipes'.
- In my project, click on the Dataset and select my recipe
- In recipe: Enter the input of my Managed Folder (This folder points to filesystem) and output dataset.
- Run the recipe.
Error received:
FileNotFoundError: [Errno 2] No such file or directory: '/mtn/...some path/57CFjKsj'
As far as i can tell, the Containerized Execution cannot see the files in Managed Folder. Could you advise what is going on?
Thank you.
Answers
-
Hi,
To run a recipe on Containerized Execution when using managed folders as input/output, you will need to use the Dataiku API for reading/writing files.
In other words, the "regular" local filesystem API which you were using with local execution cannot work anymore. Hence, you will need to use the following methods to interact with the managed folder:
- Reading
- List paths: https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.core.managed_folder.Folder.list_paths_in_partition
- Download a file from the managed folder to the container: https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.core.managed_folder.Folder.get_download_stream
- Then you can use the regular filesystem API on the container to load data
- Writing
- If you already have a file on the filesystem of the container, you can upload it to the managed folder with https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.core.managed_folder.Folder.upload_file
- If you want to use a file-like object (faster, no need to write to container filesystem), you can use https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.core.managed_folder.Folder.upload_file
The same APIs also exist in R, as documented here:
- Reading: https://doc.dataiku.com/dss/api/6.0/R/dataiku/reference/dkuManagedFolderCopyToLocal.html
- Writing: https://doc.dataiku.com/dss/api/6.0/R/dataiku/reference/dkuManagedFolderCopyFromLocal.html
Hope it helps,
Alex Combessie
- Reading
-
Thanks Alex, let me try this out and get back to you.
-
Hi @Alex_Combessie
, thanks for that. That's really helpful. I have another question - how to remove the files in the managed folder on containerised execution?