Exporting a file in dynamic name

Options
Vinothkumar
Vinothkumar Registered Posts: 17 ✭✭✭✭

Hi,

I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"

So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.

Thanks,

Vinothkumar M

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi

    DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with

    v# -*- coding: utf-8 -*-import dataikuimport pandas as pdimport osds = dataiku.Dataset("...the dataset name")df = ds.get_dataframe()f = dataiku.Folder("...the folder id")path = f.get_path()df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))

    or with a first recipe Export to folder, followed by a Python recipe to rename the file, like

    # -*- coding: utf-8 -*-import dataikuimport pandas as pdexported = dataiku.Folder("f")final = dataiku.Folder("g")csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0]with exported.get_download_stream(csv_in_folder) as s:data = s.read()final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
  • Vinothkumar
    Vinothkumar Registered Posts: 17 ✭✭✭✭
    Options

    @fchataigner2
    Thanks for your response. Works cool for the local drive.

  • sameerk007
    sameerk007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭
    Options

    I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :

    This is what I did :

    # Import modluesimport dataikuimport pandas as pd, numpy as npfrom dataiku import pandasutils as pduimport boto3from datetime import datetimefrom io import StringIO# Read recipe inputsdataset = dataiku.Dataset("Dataset-Name")data_df = dataset.get_dataframe()# Get the timedate = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p")#Put the dataframe into buffercsv_buffer = StringIO()data_df .to_csv(csv_buffer)# Connect to s3session = boto3.Session(aws_access_key_id='your _access_id',aws_secret_access_key='your_secret_access_key',)s3_res = session.resource('s3') # Create a sessionbucket_name = 'your_s3_bucket_name'# set file name with path and dates3_object_name = f'path-to-output-folder/filename_{date}.csv'#Push the file to s3s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())

Setup Info
    Tags
      Help me…