Exporting a file in dynamic name

Vinothkumar
Vinothkumar Registered Posts: 17 ✭✭✭✭

Hi,

I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"

So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.

Thanks,

Vinothkumar M

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    edited July 17

    Hi

    DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with

    v# -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd
    import os
    
    ds = dataiku.Dataset("...the dataset name")
    df = ds.get_dataframe()
    f = dataiku.Folder("...the folder id")
    path = f.get_path()
    
    df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))

    or with a first recipe Export to folder, followed by a Python recipe to rename the file, like

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd
    
    exported = dataiku.Folder("f")
    final = dataiku.Folder("g")
    csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0]
    with exported.get_download_stream(csv_in_folder) as s:
        data = s.read()
    final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
  • Vinothkumar
    Vinothkumar Registered Posts: 17 ✭✭✭✭

    @fchataigner2
    Thanks for your response. Works cool for the local drive.

  • sameerk007
    sameerk007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭
    edited July 17

    I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :

    This is what I did :

    # Import modlues
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    import boto3
    from datetime import datetime
    from io import StringIO
    
    
    # Read recipe inputs
    
    dataset = dataiku.Dataset("Dataset-Name")
    data_df = dataset.get_dataframe()
    
    # Get the time
    
    date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p")
    
    #Put the dataframe into buffer
    
    csv_buffer = StringIO()
    data_df .to_csv(csv_buffer)
    
    # Connect to s3
    
    session = boto3.Session(
    aws_access_key_id='your _access_id',
    aws_secret_access_key='your_secret_access_key',
    )
    
    s3_res = session.resource('s3') # Create a session
    
    
    bucket_name = 'your_s3_bucket_name'
    
    # set file name with path and date
    
    s3_object_name = f'path-to-output-folder/filename_{date}.csv' 
    
    #Push the file to s3
    
    s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())

Setup Info
    Tags
      Help me…