Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Exporting a file in dynamic name

Vinothkumar
Level 2
Exporting a file in dynamic name

Hi,

I have my output as "Final_output" at the end of the flow. I want to export this  into S3 as a csv with the name "Final_output_$datetime.csv"

So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.

 

Thanks,

Vinothkumar M

0 Kudos
3 Replies
fchataigner2
Dataiker
Dataiker

Hi

DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with

v# -*- coding: utf-8 -*-
import dataiku
import pandas as pd
import os

ds = dataiku.Dataset("...the dataset name")
df = ds.get_dataframe()
f = dataiku.Folder("...the folder id")
path = f.get_path()

df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))

or with a first recipe Export to folder, followed by a Python recipe to rename the file, like

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd

exported = dataiku.Folder("f")
final = dataiku.Folder("g")
csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0]
with exported.get_download_stream(csv_in_folder) as s:
    data = s.read()
final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
0 Kudos
Vinothkumar
Level 2
Author

@fchataigner2 Thanks for your response. Works cool for the local drive. 

0 Kudos
sameerk007
Level 1

I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :

This is what I did :

 

 

 

# Import modlues
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import boto3
from datetime import datetime
from io import StringIO


# Read recipe inputs

dataset = dataiku.Dataset("Dataset-Name")
data_df = dataset.get_dataframe()

# Get the time

date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p")

#Put the dataframe into buffer

csv_buffer = StringIO()
data_df .to_csv(csv_buffer)

# Connect to s3

session = boto3.Session(
aws_access_key_id='your _access_id',
aws_secret_access_key='your_secret_access_key',
)

s3_res = session.resource('s3') # Create a session


bucket_name = 'your_s3_bucket_name'

# set file name with path and date

s3_object_name = f'path-to-output-folder/filename_{date}.csv' 

#Push the file to s3

s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())