Exporting a file in dynamic name

Vinothkumar
Vinothkumar Registered Posts: 17 ✭✭✭✭

Hi,

I have my output as "Final_output" at the end of the flow. I want to export this into S3 as a csv with the name "Final_output_$datetime.csv"

So everytime the flow runs, it has to create a file with that timestamp. I tried with variable. But didnt work when it comes to file name creation.

Thanks,

Vinothkumar M

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    edited July 17

    Hi

    DSS doesn't let you control the name of the files it produces, so you need a Python recipe to a managed folder to do it. For example with

    v# -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd
    import os
    
    ds = dataiku.Dataset("...the dataset name")
    df = ds.get_dataframe()
    f = dataiku.Folder("...the folder id")
    path = f.get_path()
    
    df.to_csv(os.path.join(path, "final_output_%s.csv" % dataiku.get_custom_variables()["datetime"]))

    or with a first recipe Export to folder, followed by a Python recipe to rename the file, like

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd
    
    exported = dataiku.Folder("f")
    final = dataiku.Folder("g")
    csv_in_folder = [x for x in exported.list_paths_in_partition() if x.endswith('.csv')][0]
    with exported.get_download_stream(csv_in_folder) as s:
        data = s.read()
    final.upload_data("final_output_%s.csv" % dataiku.get_custom_variables()["datetime"], data)
  • Vinothkumar
    Vinothkumar Registered Posts: 17 ✭✭✭✭

    @fchataigner2
    Thanks for your response. Works cool for the local drive.

  • sameerk007
    sameerk007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭
    edited July 17

    I had something similar to do : Had to push a dataframe to S3 with a timestamp on the file name :

    This is what I did :

    # Import modlues
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    import boto3
    from datetime import datetime
    from io import StringIO
    
    
    # Read recipe inputs
    
    dataset = dataiku.Dataset("Dataset-Name")
    data_df = dataset.get_dataframe()
    
    # Get the time
    
    date = datetime.now().strftime("%m_%d_%Y-%H:%M:%S_%p")
    
    #Put the dataframe into buffer
    
    csv_buffer = StringIO()
    data_df .to_csv(csv_buffer)
    
    # Connect to s3
    
    session = boto3.Session(
    aws_access_key_id='your _access_id',
    aws_secret_access_key='your_secret_access_key',
    )
    
    s3_res = session.resource('s3') # Create a session
    
    
    bucket_name = 'your_s3_bucket_name'
    
    # set file name with path and date
    
    s3_object_name = f'path-to-output-folder/filename_{date}.csv' 
    
    #Push the file to s3
    
    s3_res.Object(bucket_name, s3_object_name).put(Body=csv_buffer.getvalue())

  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 163 Dataiker

    Hi @Vinothkumar and @sameerk007 ,

    I wanted to let you know about new capabilities released in version 13.2 of Dataiku that may be helpful with this and other similar use cases: dynamic datasets and repeating recipes. This new advanced mode exists for a select number of visual recipes and lets you use parameters from a secondary dataset to configure settings in the recipe. It will run the recipe once for each row in the parameters dataset.

    For your case, you'll be able to enable an advanced mode of an "Export to Folder" recipe and add "Final_output_${datetime}.csv" as the file name you want. Connect a parameters dataset that contains the datetime of your flow run, and when you run the recipe, you'll find a file called "Final_output_2024-01-01.csv"

    If you'd like to try it, you can learn more in the Knowledge Base or try a hands-on tutorial.

    Cheers,

    Ashley

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
    edited November 7

    Thanks Ashley, I have not noticed this extra capability on the repeating recipes. It's very cool. And in fact I just tested this and it can also be used to give exported files a dynamic name based on a value on the dataset, like a date! This works since I can set the repeating recipes to use a new group by recipe on my input dataset and get me for instance the max(date) of my data or now() and then use that recipe as the repeating recipe dataset on the export to folder recipe. I can then pass the variable value to the file name field. I don't even need to set the filter as there is no need to filter in my use case, my repeating recipe dataset will always have 1 row only. Finally exports with dynamic file names are supported in visual recipes!

Setup Info
    Tags
      Help me…