copy files from bucket to bucket

mass84 Registered Posts: 1 ✭✭✭✭

Hi !!

I have two folders both connected to the same S3 bucket but not the same directory.

I want to copy a subset of first folder in the second folder with a recipe (python if possible).

I've already tried to do this with :

f = Folder_A.get_download_stream(filename) and Folder_B.upload_stream(file_copy, f)

But it's veeeery slow (5 min to copy a 20mo file)

Is there a better method to copy a file from bucket to bucket ?

Thank you !!


Best Answer


  • Tanguy
    Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 112 Neuron

    Here is a code snippet we use to copy/paste S3 objects on a same bucket. I suppose you could adapt it to copy/paste objects across buckets (but your message suggests you are working on a single bucket).

    import dataiku
    import boto3

    BUCKET_NAME = "my_bucket_name"

    def get_aws_credentials(connector):
    Get AWS credentials from Dataiku S3 connector
    client = dataiku.api_client()
    connection = client.get_connection(connector)
    connection_info = connection.get_info()
    aws_credentials = connection_info.get_aws_credential()

    return aws_credentials

    def get_bucket_handler(connector):
    aws_credentials = get_aws_credentials(connector)
    session = boto3.Session(aws_access_key_id=aws_credentials["accessKey"],
    aws_session_token = aws_credentials["sessionToken"])

    s3 = session.resource('s3')
    bucket = s3.Bucket(BUCKET_NAME)

    return bucket

    def copy_files_s3(source_key, target_key, connector):
    bucket = get_bucket_handler(connector)
    old_source = {'Bucket': BUCKET_NAME,
    'Key': source_key}
    new_obj = bucket.Object(target_key)

    Be careful that your keys do not start with '/' as boto3 will not find your object (or will not be able to write it).

Setup Info
      Help me…