Create a csv with today datetime on a sync recipe to upload a dataset from dataiku into S3

Solved!
Jesus
Level 2
Create a csv with today datetime on a sync recipe to upload a dataset from dataiku into S3

As the subject says i need to upload on an S3 folder connected to my project multiple csv's (one per month) after a scenario is triggered. I can't use formulas or variables on the output csv file name of the recipe (which should got the datetime as name), is there any option to solve this issue?

Thanks for the help

Jesus

0 Kudos
1 Solution
Jesus
Level 2
Author

To any who wonder, this is the code i used to solve the problem:


# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import dataikuapi
from datetime import datetime

# Read recipe inputs
ds = dataiku.Dataset("dataiku_dataset_name")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = datetime.now().strftime("%m_%Y")
p.set_variables(v)

managed_folder_id = "managed_folder_id" #Specific code for a managed folder created in dataiku

df.to_csv(index=False).encode("utf-8")

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("Name_of__the_csv_%s.csv" % dataiku.get_custom_variables()["current_month_year"], df.to_csv(index=False).encode("utf-8"))

 

You got to create a managed folder in dataiku with the S3 location and the path in bucket where you want that the csv's will be uploaded. Also you can adjust the name of the csv selecting the exact timeframe you need with the datetime.now().strftime()

View solution in original post

0 Kudos
1 Reply
Jesus
Level 2
Author

To any who wonder, this is the code i used to solve the problem:


# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import dataikuapi
from datetime import datetime

# Read recipe inputs
ds = dataiku.Dataset("dataiku_dataset_name")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = datetime.now().strftime("%m_%Y")
p.set_variables(v)

managed_folder_id = "managed_folder_id" #Specific code for a managed folder created in dataiku

df.to_csv(index=False).encode("utf-8")

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("Name_of__the_csv_%s.csv" % dataiku.get_custom_variables()["current_month_year"], df.to_csv(index=False).encode("utf-8"))

 

You got to create a managed folder in dataiku with the S3 location and the path in bucket where you want that the csv's will be uploaded. Also you can adjust the name of the csv selecting the exact timeframe you need with the datetime.now().strftime()

0 Kudos