Upload a new csv file with suffix to change on daily basis to SFTP using sync recipe.

Solved!
vinayk
Level 2
Upload a new csv file with suffix to change on daily basis to SFTP using sync recipe.

Hello,

I am trying to upload a csv file using the sync recipe with SFTP protocol, the flows run on daily basis and the recipe replaces the old file with the new file, but I want to create new file with every upload and give the file name with some suffix(mainly date) so that the old file is not replaced in SFTP path on daily basis, instead a new file is uploaded daily.

Hope the question is clear, Kindly help me to do this.

1 Solution
AlexT
Dataiker

Hi,

It is not possible to control the name of file directly in the Sync or Export to folder recipe.  You can rename the folder within the SFTP connection


1) Add the suffix to the connection with you variable name ( make sure you have allowed managed datasets on this SFTP connection)
Screenshot 2022-05-26 at 11.05.59.png

2) Define date_suffix in a scenario, using  now() and toString() with the date format you want you would likely want it to be only yyyy-MM-dd ( if you are running this daily)

Screenshot 2022-05-26 at 11.24.40.png

 

3) Create the Sync recipe  and this will  create the folder dataset_name+suffix defined every time the recipe is run: 

[dataiku@localhost ~]$ ls -ltr Intercom_conv*
Intercom_conv_lookup_table_copy_22022-05-26-11-24:
total 316
-rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:36 out-s0.csv.gz

Intercom_conv_lookup_table_copy_22022-05-26-12-24:
total 316
-rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:37 out-s0.csv.gz

The other alternative if you want to control the name of the file "out-s0.csv.gz" then you would need to use a managed folder rather the SFTP dataset and use Python recipe to write the files with the timestamps there. 

 

Thanks,

 

 

View solution in original post

0 Kudos
4 Replies
AlexT
Dataiker

Hi,

It is not possible to control the name of file directly in the Sync or Export to folder recipe.  You can rename the folder within the SFTP connection


1) Add the suffix to the connection with you variable name ( make sure you have allowed managed datasets on this SFTP connection)
Screenshot 2022-05-26 at 11.05.59.png

2) Define date_suffix in a scenario, using  now() and toString() with the date format you want you would likely want it to be only yyyy-MM-dd ( if you are running this daily)

Screenshot 2022-05-26 at 11.24.40.png

 

3) Create the Sync recipe  and this will  create the folder dataset_name+suffix defined every time the recipe is run: 

[dataiku@localhost ~]$ ls -ltr Intercom_conv*
Intercom_conv_lookup_table_copy_22022-05-26-11-24:
total 316
-rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:36 out-s0.csv.gz

Intercom_conv_lookup_table_copy_22022-05-26-12-24:
total 316
-rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:37 out-s0.csv.gz

The other alternative if you want to control the name of the file "out-s0.csv.gz" then you would need to use a managed folder rather the SFTP dataset and use Python recipe to write the files with the timestamps there. 

 

Thanks,

 

 

0 Kudos
vinayk
Level 2
Author

Thank you @AlexT for the quick response, this helps.

Is there a way from python recipe we can do it to connect to SFTP and add the timestamp as suffix.

If yes, can you also shared the code snippet to do the same.

Thanks again.

0 Kudos
AlexT
Dataiker

I would definitely advise against modifying the files/ filename inside a DSS-managed dataset path.  It can create issues, e.g you would see duplicate data when reading the dataset in DSS, since it would point to the path and not the exact file

If you want you want to control the filename and not just the directory  then use a managed folder  on your SFTP connection as the output and the input as your current dataset and create a file using the python recipe :

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime

managed_folder_id = "URKU7Oqb"
date = datetime.now().strftime("%Y-%m-%d")


# Read dataset convert df to csv inst
my_dataset = dataiku.Dataset("customers_labeled_prepared")
df = my_dataset.get_dataframe()

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("filename" + date + ".csv", df.to_csv(index=False).encode("utf-8"))

 

0 Kudos
vinayk
Level 2
Author

Thanks a lot, I think first solution would be feasible.

0 Kudos