Managed Folder with Container Execution

Solved!
Astrogurl
Level 1
Managed Folder with Container Execution

I am trying to run a python recipe and have a model saved in a managed folder. I understand that I have to use get_download_stream() to read the data, but the python module that I need to use (FAISS) does not support reading the saved model as bytes. Is there a way that I can download the file and obtain a path, so that I can just feed in the path to the python module during the containerized execution?


Operating system used: Mac OS Monterey

0 Kudos
1 Solution
ZachM
Dataiker

Hi @Astrogurl,

The following code will download the file to a temporary directory first so that you can pass the path to FAISS:

 

import os.path
import shutil
import tempfile

import dataiku

folder = dataiku.Folder("FOLDER")

with tempfile.TemporaryDirectory() as temp_dir:
    path = os.path.join(temp_dir, "my-file.txt")
    
    # Download the remote file to `path`
    with folder.get_download_stream("/my-file.txt") as download_stream:
        with open(path, "wb") as local_file:
            shutil.copyfileobj(download_stream, local_file)
            
    # Do stuff with the temp file here
    # It will be automatically deleted when the `temp_dir` block finishes
    print(path)

 

 

Thanks,

Zach

View solution in original post

3 Replies
ZachM
Dataiker

Hi @Astrogurl,

The following code will download the file to a temporary directory first so that you can pass the path to FAISS:

 

import os.path
import shutil
import tempfile

import dataiku

folder = dataiku.Folder("FOLDER")

with tempfile.TemporaryDirectory() as temp_dir:
    path = os.path.join(temp_dir, "my-file.txt")
    
    # Download the remote file to `path`
    with folder.get_download_stream("/my-file.txt") as download_stream:
        with open(path, "wb") as local_file:
            shutil.copyfileobj(download_stream, local_file)
            
    # Do stuff with the temp file here
    # It will be automatically deleted when the `temp_dir` block finishes
    print(path)

 

 

Thanks,

Zach

@ZachM Nice solution of course, but shouldn't this really be handled by Dataiku?  I would expect this sort of stuff to be handled in the background, invisible to the end user.

0 Kudos
Astrogurl
Level 1
Author

Thank you @ZachM, this works!

0 Kudos