Upload a new csv file with suffix to change on daily basis to SFTP using sync recipe.

vinayk
vinayk Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner

Hello,

I am trying to upload a csv file using the sync recipe with SFTP protocol, the flows run on daily basis and the recipe replaces the old file with the new file, but I want to create new file with every upload and give the file name with some suffix(mainly date) so that the old file is not replaced in SFTP path on daily basis, instead a new file is uploaded daily.

Hope the question is clear, Kindly help me to do this.

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,218 Dataiker
    Answer ✓

    Hi,

    It is not possible to control the name of file directly in the Sync or Export to folder recipe. You can rename the folder within the SFTP connection


    1) Add the suffix to the connection with you variable name ( make sure you have allowed managed datasets on this SFTP connection)
    Screenshot 2022-05-26 at 11.05.59.png

    2) Define date_suffix in a scenario, using now() and toString() with the date format you want you would likely want it to be only yyyy-MM-dd ( if you are running this daily)

    Screenshot 2022-05-26 at 11.24.40.png

    3) Create the Sync recipe and this will create the folder dataset_name+suffix defined every time the recipe is run:

    [dataiku@localhost ~]$ ls -ltr Intercom_conv*
    Intercom_conv_lookup_table_copy_22022-05-26-11-24:
    total 316
    -rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:36 out-s0.csv.gz

    Intercom_conv_lookup_table_copy_22022-05-26-12-24:
    total 316
    -rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:37 out-s0.csv.gz

    The other alternative if you want to control the name of the file "out-s0.csv.gz" then you would need to use a managed folder rather the SFTP dataset and use Python recipe to write the files with the timestamps there.

    Thanks,

Answers

  • vinayk
    vinayk Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner

    Thank you @AlexT
    for the quick response, this helps.

    Is there a way from python recipe we can do it to connect to SFTP and add the timestamp as suffix.

    If yes, can you also shared the code snippet to do the same.

    Thanks again.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,218 Dataiker
    edited July 17

    I would definitely advise against modifying the files/ filename inside a DSS-managed dataset path. It can create issues, e.g you would see duplicate data when reading the dataset in DSS, since it would point to the path and not the exact file

    If you want you want to control the filename and not just the directory then use a managed folder on your SFTP connection as the output and the input as your current dataset and create a file using the python recipe :

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    from datetime import datetime
    
    managed_folder_id = "URKU7Oqb"
    date = datetime.now().strftime("%Y-%m-%d")
    
    
    # Read dataset convert df to csv inst
    my_dataset = dataiku.Dataset("customers_labeled_prepared")
    df = my_dataset.get_dataframe()
    
    # Write recipe outputs
    output_folder = dataiku.Folder(managed_folder_id)
    output_folder.upload_stream("filename" + date + ".csv", df.to_csv(index=False).encode("utf-8"))

  • vinayk
    vinayk Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner

    Thanks a lot, I think first solution would be feasible.

Setup Info
    Tags
      Help me…