Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi !!
I have two folders both connected to the same S3 bucket but not the same directory.
I want to copy a subset of first folder in the second folder with a recipe (python if possible).
I've already tried to do this with :
f = Folder_A.get_download_stream(filename) and Folder_B.upload_stream(file_copy, f)
But it's veeeery slow (5 min to copy a 20mo file)
Is there a better method to copy a file from bucket to bucket ?
Thank you !!
Hi,
The get_download_stream and upload_stream method imply data streaming in-and-out of S3 onto the DSS server.
You can achieve higher efficiency by using a cloud-specific API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client
Cheers,
Alex
Hi,
The get_download_stream and upload_stream method imply data streaming in-and-out of S3 onto the DSS server.
You can achieve higher efficiency by using a cloud-specific API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client
Cheers,
Alex
Here is a code snippet we use to copy/paste S3 objects on a same bucket. I suppose you could adapt it to copy/paste objects across buckets (but your message suggests you are working on a single bucket).
import dataiku
import boto3
BUCKET_NAME = "my_bucket_name"
def get_aws_credentials(connector):
"""
Get AWS credentials from Dataiku S3 connector
"""
client = dataiku.api_client()
connection = client.get_connection(connector)
connection_info = connection.get_info()
aws_credentials = connection_info.get_aws_credential()
return aws_credentials
def get_bucket_handler(connector):
aws_credentials = get_aws_credentials(connector)
session = boto3.Session(aws_access_key_id=aws_credentials["accessKey"],
aws_secret_access_key=aws_credentials["secretKey"],
aws_session_token = aws_credentials["sessionToken"])
s3 = session.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)
return bucket
def copy_files_s3(source_key, target_key, connector):
bucket = get_bucket_handler(connector)
old_source = {'Bucket': BUCKET_NAME,
'Key': source_key}
new_obj = bucket.Object(target_key)
new_obj.copy(old_source)
Be careful that your keys do not start with '/' as boto3 will not find your object (or will not be able to write it).