How to upload a file to a subfolder using the Python API

Options
daida
daida Registered Posts: 5 ✭✭✭✭

Dear Community,

does anybody know how to upload a file to a subfolder using the Python API? The "put_file()" method of managed folder can only upload the file to the folder itself but not one subfolder of it, even I have set up the path like put_file('/subfolder/filename.txt', file_content).

It seems that the upload_data() method (https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.upload_data) can create the subfolder automatically, but this method only exists in dataiku package but NOT in dataikuapi.

Actually it's one very basic function and I wonder why it is not implemented in the API?

Answers

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Options

    Hi Daida,

    Can you confirm what connection type you are using for your managed folders? You should be able to create subdirectories using something like upload_data() if the filesystem itself supports it. For example, please refer to the following thread where this was previously addressed:

    https://community.dataiku.com/t5/Plugins-Extending-Dataiku-DSS/API-to-create-a-directory-under-a-managed-folder-not-local/m-p/8094#M438

    Best,

    Andrew

  • daida
    daida Registered Posts: 5 ✭✭✭✭
    Options

    Hi Andrew,

    Thanks! The connection type is normal "filesystem".

    I want to use the dataikuapi (https://doc.dataiku.com/dss/latest/python-api/outside-usage.html) from outside of DSS to upload files to the subfolder of one managed folders in DSS. The put_file() doesn't support the subfolder path. There is also no id for the subfolder. Otherwise I can use the id to get the DSSManagedFolder object and then use put_file().

    The upload_data() doesn't exist in the dataikuapi.

    Do you have any idea to do this?

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    edited July 17
    Options

    Hi Daida,

    Can you share the code that you are using? For example, I tested this locally and it's working for me. In my case, my input folder is a managed folder that contains nested folders and my output folder is an empty managed folder, both using the local filesystem. When I run the following code, it successfully copies over all the objects in the input folder to the output folder, including any subfolders that are present.

    import dataikuapi
    
    # Instantiate client
    DSS_host = "http://localhost:11200" # example to be changed
    DSS_apiKey = "J8dpYk1sXunjns1HZ5BxeXLavxKy9CaK" # example to be changed
    client = dataikuapi.DSSClient(DSS_host, DSS_apiKey)
    
    # Retrieve project and managed folders
    project = client.get_project("PROJECT_KEY")
    input_folder = project.get_managed_folder("MANAGED_FOLDER_ID")
    output_folder = project.get_managed_folder("MANAGED_FOLDER_ID")
    
    # Retrieve all items in the input managed folder
    input_items = input_folder.list_contents()['items']
    
    # Copy all objects from input managed folder to output managed folder
    x=0
    for input_items[x] in input_items:
        item_path = input_items[x]['path']
        file_to_copy = input_folder.get_file(item_path)
        output_folder.put_file(item_path, file_to_copy.raw)
    x +=1


    Otherwise, if you are still facing issues, you could always use the dataiku package instead even externally by following the steps described here:
    https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#using-the-dataiku-package

    Best,
    Andrew

  • daida
    daida Registered Posts: 5 ✭✭✭✭
    edited July 17
    Options

    Hi Andrew,

    actually I have very similar code like yours

    def copy_files_between_folders(src_folder, dest_folder):
        """Copy all files from source folder to destination folder
    
        Arguments:
            src_folder {dataikuapi.dss.managedfolder.DSSManagedFolder} -- the source managed folder handle
            dest_folder {dataikuapi.dss.managedfolder.DSSManagedFolder} -- the destination managed folder handle
        """
        # List file inside source folder
        files_src=[fi['path'] for fi in src_folder.list_contents()['items']]
        # List file inside folder
        files_dest = [fi['path'] for fi in dest_folder.list_contents()['items']]
    
        # Find the different files between souce and destination, which should be copied
        upload_files = list(set(files_src) - set(files_dest))
        if len(upload_files) > 0:
            logging.info('{} files should be copied'.format(len(upload_files)))
            i = 0
            if len(upload_files) > 0:
                for fi in upload_files:
                    logging.info('upload {} now'.format(fi))
                    file = src_folder.get_file(fi)
                    dest_folder.put_file(fi, file.content)
                    logging.info('upload {} sucessfully'.format(fi))
                    i = i + 1
            logging.info('upload {} files totally'.format(i))
        else:
            logging.info('Copy skiped because there is no difference between the both folders')

    I want to use this script to copy files in subdirectories of managed folders at remote DSS instance. The connection of the folders should be same as yours (snap in attachment). However it does not work. The files were only copied to the destination folder itself but not the subfolders, I mean, the subfolders at destination folder are not created automatically, like upload_data() does.

    Could you assume where I should check? Thanks a lot for your effort!

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Options

    Hi Daida,

    Could you try using the sample code I provided and let me know if you face the same issue? If you recall, this code will copy over contents from an input managed folder (which can contain subfolders) into an output managed folder. To keep the testing simple, I would suggest using an empty output folder.

    Also, as mentioned previously, if you continue to face issues, you can always simply use the dataiku package externally by following the steps detailed in our documentation here:

    https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#using-the-dataiku-package

    Best,

    Andrew

Setup Info
    Tags
      Help me…