Exporting a file in dynamic name
Vinothkumar
Registered Posts: 17 ✭✭✭✭
Hi,
I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"
So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.
Thanks,
Vinothkumar M
Answers
-
Hi
DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with
v# -*- coding: utf-8 -*- import dataiku import pandas as pd import os ds = dataiku.Dataset("...the dataset name") df = ds.get_dataframe() f = dataiku.Folder("...the folder id") path = f.get_path() df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))
or with a first recipe Export to folder, followed by a Python recipe to rename the file, like
# -*- coding: utf-8 -*- import dataiku import pandas as pd exported = dataiku.Folder("f") final = dataiku.Folder("g") csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0] with exported.get_download_stream(csv_in_folder) as s: data = s.read() final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
-
@fchataigner2
Thanks for your response. Works cool for the local drive. -
sameerk007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭
I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :
This is what I did :
# Import modlues import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import boto3 from datetime import datetime from io import StringIO # Read recipe inputs dataset = dataiku.Dataset("Dataset-Name") data_df = dataset.get_dataframe() # Get the time date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p") #Put the dataframe into buffer csv_buffer = StringIO() data_df .to_csv(csv_buffer) # Connect to s3 session = boto3.Session( aws_access_key_id='your _access_id', aws_secret_access_key='your_secret_access_key', ) s3_res = session.resource('s3') # Create a session bucket_name = 'your_s3_bucket_name' # set file name with path and date s3_object_name = f'path-to-output-folder/filename_{date}.csv' #Push the file to s3 s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())