Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I have some model files in a managed folder stored on DSS, I want to copy them to a new folder in S3, is there a way to do it using the python recipe?
Hi @harsha_dataiku,
Yes, you can do so with the managed folder read/write APIs: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#usage-in-python
The following code should work for transferring files from a local folder to a remote folder:
input_folder = dataiku.Folder("lE3JuuYn") # replace with folder name or folder ID (retrieved from URL)
input_folder_files = input_folder.list_paths_in_partition()
output_folder = dataiku.Folder("Pxyks4jt")
x = 0
for input_folder_files[x] in input_folder_files:
with input_folder.get_download_stream(input_folder_files[x]) as f:
data = f.read()
output_path = input_folder_files[x].split('/')[-1]
with output_folder.get_writer(output_path) as w:
w.write(data)
print("Successfully transferred {}".format(output_path))
x += 1
Best,
Jordan
Hi @harsha_dataiku,
Yes, you can do so with the managed folder read/write APIs: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html#usage-in-python
The following code should work for transferring files from a local folder to a remote folder:
input_folder = dataiku.Folder("lE3JuuYn") # replace with folder name or folder ID (retrieved from URL)
input_folder_files = input_folder.list_paths_in_partition()
output_folder = dataiku.Folder("Pxyks4jt")
x = 0
for input_folder_files[x] in input_folder_files:
with input_folder.get_download_stream(input_folder_files[x]) as f:
data = f.read()
output_path = input_folder_files[x].split('/')[-1]
with output_folder.get_writer(output_path) as w:
w.write(data)
print("Successfully transferred {}".format(output_path))
x += 1
Best,
Jordan
Hi, it is throwing an error when I am trying to copy PDF files, is there a way we can copy them? Thanks in advance.
An error? What error?
There is a much better way of doing this and it doesn't require any code. First use Files in Folder dataset to group your files by dataset. Then simple use a Sync recipe to move them from Files in Folder dataset to a bucket in any cloud. See below my sample flow using an Azure bucket but S3 will work in the same way. And if you use this hidden feature of the Files in Folder dataset you can even have full traceability of where each record came from. This solution will be way faster than Python.
And if you need to keep the files in the original format either use s3fs-fuse or this solution to moint the S3 bucket on your current DSS machine and copy the files manually from each directory.