Create a csv with today datetime on a sync recipe to upload a dataset from dataiku into S3

Jesus · March 2022

As the subject says i need to upload on an S3 folder connected to my project multiple csv's (one per month) after a scenario is triggered. I can't use formulas or variables on the output csv file name of the recipe (which should got the datetime as name), is there any option to solve this issue?

Thanks for the help

Jesus

Jesus · March 2022

To any who wonder, this is the code i used to solve the problem:

# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import dataikuapi
from datetime import datetime

# Read recipe inputs
ds = dataiku.Dataset("dataiku_dataset_name")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = datetime.now().strftime("%m_%Y")
p.set_variables(v)

managed_folder_id = "managed_folder_id" #Specific code for a managed folder created in dataiku

df.to_csv(index=False).encode("utf-8")

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("Name_of__the_csv_%s.csv" % dataiku.get_custom_variables()["current_month_year"], df.to_csv(index=False).encode("utf-8"))

You got to create a managed folder in dataiku with the S3 location and the path in bucket where you want that the csv's will be uploaded. Also you can adjust the name of the csv selecting the exact timeframe you need with the datetime.now().strftime()

Create a csv with today datetime on a sync recipe to upload a dataset from dataiku into S3

Best Answer

Categories

Setup Info

Tags