Exporting a file in dynamic name
Hi,
I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"
So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.
Thanks,
Vinothkumar M
Answers
-
Hi
DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with
v# -*- coding: utf-8 -*- import dataiku import pandas as pd import os ds = dataiku.Dataset("...the dataset name") df = ds.get_dataframe() f = dataiku.Folder("...the folder id") path = f.get_path() df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))
or with a first recipe Export to folder, followed by a Python recipe to rename the file, like
# -*- coding: utf-8 -*- import dataiku import pandas as pd exported = dataiku.Folder("f") final = dataiku.Folder("g") csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0] with exported.get_download_stream(csv_in_folder) as s: data = s.read() final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
-
@fchataigner2
Thanks for your response. Works cool for the local drive. -
sameerk007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭
I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :
This is what I did :
# Import modlues import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import boto3 from datetime import datetime from io import StringIO # Read recipe inputs dataset = dataiku.Dataset("Dataset-Name") data_df = dataset.get_dataframe() # Get the time date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p") #Put the dataframe into buffer csv_buffer = StringIO() data_df .to_csv(csv_buffer) # Connect to s3 session = boto3.Session( aws_access_key_id='your _access_id', aws_secret_access_key='your_secret_access_key', ) s3_res = session.resource('s3') # Create a session bucket_name = 'your_s3_bucket_name' # set file name with path and date s3_object_name = f'path-to-output-folder/filename_{date}.csv' #Push the file to s3 s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())
-
Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 163 Dataiker
Hi @Vinothkumar and @sameerk007 ,
I wanted to let you know about new capabilities released in version 13.2 of Dataiku that may be helpful with this and other similar use cases: dynamic datasets and repeating recipes. This new advanced mode exists for a select number of visual recipes and lets you use parameters from a secondary dataset to configure settings in the recipe. It will run the recipe once for each row in the parameters dataset.
For your case, you'll be able to enable an advanced mode of an "Export to Folder" recipe and add "Final_output_${datetime}.csv" as the file name you want. Connect a parameters dataset that contains the datetime of your flow run, and when you run the recipe, you'll find a file called "Final_output_2024-01-01.csv"
If you'd like to try it, you can learn more in the Knowledge Base or try a hands-on tutorial.
Cheers,
Ashley
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Thanks Ashley, I have not noticed this extra capability on the repeating recipes. It's very cool. And in fact I just tested this and it can also be used to give exported files a dynamic name based on a value on the dataset, like a date! This works since I can set the repeating recipes to use a new group by recipe on my input dataset and get me for instance the max(date) of my data or now() and then use that recipe as the repeating recipe dataset on the export to folder recipe. I can then pass the variable value to the file name field. I don't even need to set the filter as there is no need to filter in my use case, my repeating recipe dataset will always have 1 row only. Finally exports with dynamic file names are supported in visual recipes!