Copying data from local managed folder to S3 managed folder

Solved!
harsha_dataiku
Level 2
Copying data from local managed folder to S3 managed folder

Hi,

I have some model files in a managed folder stored on DSS, I want to copy them to a new folder in S3, is there a way to do it using the python recipe?

0 Kudos
1 Solution
JordanB
Dataiker

Hi @harsha_dataiku,

Yes, you can do so with the managed folder read/write APIs: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#usage-in-python

The following code should work for transferring files from a local folder to a remote folder:

input_folder = dataiku.Folder("lE3JuuYn") # replace with folder name or folder ID (retrieved from URL)
input_folder_files = input_folder.list_paths_in_partition()

output_folder = dataiku.Folder("Pxyks4jt")
x = 0
for input_folder_files[x] in input_folder_files:
    with input_folder.get_download_stream(input_folder_files[x]) as f:
        data = f.read()
    output_path = input_folder_files[x].split('/')[-1]
    with output_folder.get_writer(output_path) as w:
        w.write(data)
        print("Successfully transferred {}".format(output_path))
    x += 1

 

Best,

Jordan

View solution in original post

5 Replies
JordanB
Dataiker

Hi @harsha_dataiku,

Yes, you can do so with the managed folder read/write APIs: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#usage-in-python

The following code should work for transferring files from a local folder to a remote folder:

input_folder = dataiku.Folder("lE3JuuYn") # replace with folder name or folder ID (retrieved from URL)
input_folder_files = input_folder.list_paths_in_partition()

output_folder = dataiku.Folder("Pxyks4jt")
x = 0
for input_folder_files[x] in input_folder_files:
    with input_folder.get_download_stream(input_folder_files[x]) as f:
        data = f.read()
    output_path = input_folder_files[x].split('/')[-1]
    with output_folder.get_writer(output_path) as w:
        w.write(data)
        print("Successfully transferred {}".format(output_path))
    x += 1

 

Best,

Jordan

harsha_dataiku
Level 2
Author

Hi, it is throwing an error when I am trying to copy PDF files, is there a way we can copy them? Thanks in advance.

0 Kudos

An error? What error?

0 Kudos
harsha_dataiku
Level 2
Author
 

I am trying to copy a local pdf to S3 folder, I am facing the below error.

0 Kudos

There is a much better way of doing this and it doesn't require any code. First use Files in Folder dataset to group your files by dataset. Then simple use a Sync recipe to move them from Files in Folder dataset to a bucket in any cloud. See below my sample flow using an Azure bucket but S3 will work in the same way. And if you use this hidden feature of the Files in Folder dataset you can even have full traceability of where each record came from. This solution will be way faster than Python.

And if you need to keep the files in the original format either use s3fs-fuse or this solution to moint the S3 bucket on your current DSS machine and copy the files manually from each directory. 

 

Screenshot 2023-12-20 at 19.36.28.png

0 Kudos