Create a csv with today datetime on a sync recipe to upload a dataset from dataiku into S3

Jesus
Jesus Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8 ✭✭✭✭

As the subject says i need to upload on an S3 folder connected to my project multiple csv's (one per month) after a scenario is triggered. I can't use formulas or variables on the output csv file name of the recipe (which should got the datetime as name), is there any option to solve this issue?

Thanks for the help

Jesus

Best Answer

  • Jesus
    Jesus Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8 ✭✭✭✭
    Answer ✓

    To any who wonder, this is the code i used to solve the problem:


    # -*- coding: utf-8 -*-
    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    import dataikuapi
    from datetime import datetime

    # Read recipe inputs
    ds = dataiku.Dataset("dataiku_dataset_name")
    df = ds.get_dataframe()
    projectKey = dataiku.get_custom_variables()["projectKey"]
    c = dataiku.api_client()
    p = c.get_project(projectKey)
    v = p.get_variables()
    v["standard"]["current_month_year"] = datetime.now().strftime("%m_%Y")
    p.set_variables(v)

    managed_folder_id = "managed_folder_id" #Specific code for a managed folder created in dataiku

    df.to_csv(index=False).encode("utf-8")

    # Write recipe outputs
    output_folder = dataiku.Folder(managed_folder_id)
    output_folder.upload_stream("Name_of__the_csv_%s.csv" % dataiku.get_custom_variables()["current_month_year"], df.to_csv(index=False).encode("utf-8"))

    You got to create a managed folder in dataiku with the S3 location and the path in bucket where you want that the csv's will be uploaded. Also you can adjust the name of the csv selecting the exact timeframe you need with the datetime.now().strftime()

Setup Info
    Tags
      Help me…