Create a csv with today datetime on a sync recipe to upload a dataset from dataiku into S3
As the subject says i need to upload on an S3 folder connected to my project multiple csv's (one per month) after a scenario is triggered. I can't use formulas or variables on the output csv file name of the recipe (which should got the datetime as name), is there any option to solve this issue?
Thanks for the help
Jesus
Best Answer
-
Jesus Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8 ✭✭✭✭
To any who wonder, this is the code i used to solve the problem:
# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import dataikuapi
from datetime import datetime# Read recipe inputs
ds = dataiku.Dataset("dataiku_dataset_name")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = datetime.now().strftime("%m_%Y")
p.set_variables(v)managed_folder_id = "managed_folder_id" #Specific code for a managed folder created in dataiku
df.to_csv(index=False).encode("utf-8")
# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("Name_of__the_csv_%s.csv" % dataiku.get_custom_variables()["current_month_year"], df.to_csv(index=False).encode("utf-8"))You got to create a managed folder in dataiku with the S3 location and the path in bucket where you want that the csv's will be uploaded. Also you can adjust the name of the csv selecting the exact timeframe you need with the datetime.now().strftime()