Exporting a file in dynamic name

Vinothkumar · October 2020

Hi,

I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"

So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.

Thanks,

Vinothkumar M

fchataigner2 · October 2020

Hi

DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with

v# -*- coding: utf-8 -*-
import dataiku
import pandas as pd
import os

ds = dataiku.Dataset("...the dataset name")
df = ds.get_dataframe()
f = dataiku.Folder("...the folder id")
path = f.get_path()

df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))

or with a first recipe Export to folder, followed by a Python recipe to rename the file, like

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd

exported = dataiku.Folder("f")
final = dataiku.Folder("g")
csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0]
with exported.get_download_stream(csv_in_folder) as s:
    data = s.read()
final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)

Vinothkumar · October 2020

@fchataigner2
Thanks for your response. Works cool for the local drive.

sameerk007 · July 2022

I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :

This is what I did :

# Import modlues
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import boto3
from datetime import datetime
from io import StringIO


# Read recipe inputs

dataset = dataiku.Dataset("Dataset-Name")
data_df = dataset.get_dataframe()

# Get the time

date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p")

#Put the dataframe into buffer

csv_buffer = StringIO()
data_df .to_csv(csv_buffer)

# Connect to s3

session = boto3.Session(
aws_access_key_id='your _access_id',
aws_secret_access_key='your_secret_access_key',
)

s3_res = session.resource('s3') # Create a session


bucket_name = 'your_s3_bucket_name'

# set file name with path and date

s3_object_name = f'path-to-output-folder/filename_{date}.csv' 

#Push the file to s3

s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())

Ashley · November 6

Hi @Vinothkumar and @sameerk007 ,

I wanted to let you know about new capabilities released in version 13.2 of Dataiku that may be helpful with this and other similar use cases: dynamic datasets and repeating recipes. This new advanced mode exists for a select number of visual recipes and lets you use parameters from a secondary dataset to configure settings in the recipe. It will run the recipe once for each row in the parameters dataset.

For your case, you'll be able to enable an advanced mode of an "Export to Folder" recipe and add "Final_output_${datetime}.csv" as the file name you want. Connect a parameters dataset that contains the datetime of your flow run, and when you run the recipe, you'll find a file called "Final_output_2024-01-01.csv"

If you'd like to try it, you can learn more in the Knowledge Base or try a hands-on tutorial.

Cheers,

Ashley

Turribeach · November 7

https://community.dataiku.com/discussion/comment/44841#Comment_44841

Thanks Ashley, I have not noticed this extra capability on the repeating recipes. It's very cool. And in fact I just tested this and it can also be used to give exported files a dynamic name based on a value on the dataset, like a date! This works since I can set the repeating recipes to use a new group by recipe on my input dataset and get me for instance the max(date) of my data or now() and then use that recipe as the repeating recipe dataset on the export to folder recipe. I can then pass the variable value to the file name field. I don't even need to set the filter as there is no need to filter in my use case, my repeating recipe dataset will always have 1 row only. Finally exports with dynamic file names are supported in visual recipes!

Exporting a file in dynamic name

Answers

Categories

Setup Info

Tags