How to upload a file to a subfolder using the Python API

daida
Level 2
How to upload a file to a subfolder using the Python API

Dear Community,

does anybody know how to upload a file to a subfolder using the Python API? The "put_file()" method of managed folder can only upload the file to the folder itself but not one subfolder of it, even I have set up the path like put_file('/subfolder/filename.txt', file_content).

It seems that the upload_data() method (https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.upload_data) can create the subfolder automatically, but this method only exists in dataiku package but NOT in dataikuapi.

Actually it's one very basic function and I wonder why it is not implemented in the API?

 

 

0 Kudos
5 Replies
ATsao
Dataiker

Hi Daida,

Can you confirm what connection type you are using for your managed folders? You should be able to create subdirectories using something like upload_data() if the filesystem itself supports it. For example, please refer to the following thread where this was previously addressed:

https://community.dataiku.com/t5/Plugins-Extending-Dataiku-DSS/API-to-create-a-directory-under-a-man...

Best,

Andrew

0 Kudos
daida
Level 2
Author

Hi Andrew,

Thanks! The connection type is normal "filesystem".

I want to use the dataikuapi (https://doc.dataiku.com/dss/latest/python-api/outside-usage.html) from outside of DSS to upload files to the subfolder of one managed folders in DSS. The put_file() doesn't support the subfolder path. There is also no id for the subfolder. Otherwise I can use the id to get the DSSManagedFolder object and then use put_file().

The upload_data() doesn't exist in the dataikuapi.

Do you have any idea to do this?

 

0 Kudos
ATsao
Dataiker

Hi Daida,

Can you share the code that you are using? For example, I tested this locally and it's working for me. In my case, my input folder is a managed folder that contains nested folders and my output folder is an empty managed folder, both using the local filesystem. When I run the following code, it successfully copies over all the objects in the input folder to the output folder, including any subfolders that are present.

import dataikuapi

# Instantiate client
DSS_host = "http://localhost:11200" # example to be changed
DSS_apiKey = "J8dpYk1sXunjns1HZ5BxeXLavxKy9CaK" # example to be changed
client = dataikuapi.DSSClient(DSS_host, DSS_apiKey)

# Retrieve project and managed folders
project = client.get_project("PROJECT_KEY")
input_folder = project.get_managed_folder("MANAGED_FOLDER_ID")
output_folder = project.get_managed_folder("MANAGED_FOLDER_ID")

# Retrieve all items in the input managed folder
input_items = input_folder.list_contents()['items']

# Copy all objects from input managed folder to output managed folder
x=0
for input_items[x] in input_items:
    item_path = input_items[x]['path']
    file_to_copy = input_folder.get_file(item_path)
    output_folder.put_file(item_path, file_to_copy.raw)
x +=1


Otherwise, if you are still facing issues, you could always use the dataiku package instead even externally by following the steps described here:
https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#using-the-dataiku-package

Best,
Andrew

daida
Level 2
Author

Hi Andrew,

actually I have very similar code like yours

def copy_files_between_folders(src_folder, dest_folder):
    """Copy all files from source folder to destination folder

    Arguments:
        src_folder {dataikuapi.dss.managedfolder.DSSManagedFolder} -- the source managed folder handle
        dest_folder {dataikuapi.dss.managedfolder.DSSManagedFolder} -- the destination managed folder handle
    """
    # List file inside source folder
    files_src=[fi['path'] for fi in src_folder.list_contents()['items']]
    # List file inside folder
    files_dest = [fi['path'] for fi in dest_folder.list_contents()['items']]

    # Find the different files between souce and destination, which should be copied
    upload_files = list(set(files_src) - set(files_dest))
    if len(upload_files) > 0:
        logging.info('{} files should be copied'.format(len(upload_files)))
        i = 0
        if len(upload_files) > 0:
            for fi in upload_files:
                logging.info('upload {} now'.format(fi))
                file = src_folder.get_file(fi)
                dest_folder.put_file(fi, file.content)
                logging.info('upload {} sucessfully'.format(fi))
                i = i + 1
        logging.info('upload {} files totally'.format(i))
    else:
        logging.info('Copy skiped because there is no difference between the both folders')

I want to use this script to copy files in subdirectories of managed folders at remote DSS instance. The connection of the folders should be same as yours (snap in attachment). However it does not work. The files were only copied to the destination folder itself but not the subfolders, I mean, the subfolders at destination folder are not created automatically, like upload_data() does.

Could you assume where I should check? Thanks a lot for your effort!

0 Kudos
ATsao
Dataiker

Hi Daida,

Could you try using the sample code I provided and let me know if you face the same issue? If you recall, this code will copy over contents from an input managed folder (which can contain subfolders) into an output managed folder. To keep the testing simple, I would suggest using an empty output folder. 

Also, as mentioned previously, if you continue to face issues, you can always simply use the dataiku package externally by following the steps detailed in our documentation here: 

https://doc.dataiku.com/dss/latest/python-api/outside-usage.html#using-the-dataiku-package

Best,

Andrew

0 Kudos