Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"
So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.
Thanks,
Vinothkumar M
Hi
DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with
v# -*- coding: utf-8 -*- import dataiku import pandas as pd import os ds = dataiku.Dataset("...the dataset name") df = ds.get_dataframe() f = dataiku.Folder("...the folder id") path = f.get_path() df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))
or with a first recipe Export to folder, followed by a Python recipe to rename the file, like
# -*- coding: utf-8 -*- import dataiku import pandas as pd exported = dataiku.Folder("f") final = dataiku.Folder("g") csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0] with exported.get_download_stream(csv_in_folder) as s: data = s.read() final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
@fchataigner2 Thanks for your response. Works cool for the local drive.
I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :
This is what I did :
# Import modlues
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import boto3
from datetime import datetime
from io import StringIO
# Read recipe inputs
dataset = dataiku.Dataset("Dataset-Name")
data_df = dataset.get_dataframe()
# Get the time
date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p")
#Put the dataframe into buffer
csv_buffer = StringIO()
data_df .to_csv(csv_buffer)
# Connect to s3
session = boto3.Session(
aws_access_key_id='your _access_id',
aws_secret_access_key='your_secret_access_key',
)
s3_res = session.resource('s3') # Create a session
bucket_name = 'your_s3_bucket_name'
# set file name with path and date
s3_object_name = f'path-to-output-folder/filename_{date}.csv'
#Push the file to s3
s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())