Upload a new csv file with suffix to change on daily basis to SFTP using sync recipe.
Hello,
I am trying to upload a csv file using the sync recipe with SFTP protocol, the flows run on daily basis and the recipe replaces the old file with the new file, but I want to create new file with every upload and give the file name with some suffix(mainly date) so that the old file is not replaced in SFTP path on daily basis, instead a new file is uploaded daily.
Hope the question is clear, Kindly help me to do this.
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,218 Dataiker
Hi,
It is not possible to control the name of file directly in the Sync or Export to folder recipe. You can rename the folder within the SFTP connection
1) Add the suffix to the connection with you variable name ( make sure you have allowed managed datasets on this SFTP connection)2) Define date_suffix in a scenario, using now() and toString() with the date format you want you would likely want it to be only yyyy-MM-dd ( if you are running this daily)
3) Create the Sync recipe and this will create the folder dataset_name+suffix defined every time the recipe is run:
[dataiku@localhost ~]$ ls -ltr Intercom_conv*
Intercom_conv_lookup_table_copy_22022-05-26-11-24:
total 316
-rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:36 out-s0.csv.gzIntercom_conv_lookup_table_copy_22022-05-26-12-24:
total 316
-rw-rw-r-- 1 dataiku dataiku 320035 May 26 10:37 out-s0.csv.gzThe other alternative if you want to control the name of the file "out-s0.csv.gz" then you would need to use a managed folder rather the SFTP dataset and use Python recipe to write the files with the timestamps there.
Thanks,
Answers
-
Thank you @AlexT
for the quick response, this helps.Is there a way from python recipe we can do it to connect to SFTP and add the timestamp as suffix.
If yes, can you also shared the code snippet to do the same.
Thanks again.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,218 Dataiker
I would definitely advise against modifying the files/ filename inside a DSS-managed dataset path. It can create issues, e.g you would see duplicate data when reading the dataset in DSS, since it would point to the path and not the exact file
If you want you want to control the filename and not just the directory then use a managed folder on your SFTP connection as the output and the input as your current dataset and create a file using the python recipe :
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu from datetime import datetime managed_folder_id = "URKU7Oqb" date = datetime.now().strftime("%Y-%m-%d") # Read dataset convert df to csv inst my_dataset = dataiku.Dataset("customers_labeled_prepared") df = my_dataset.get_dataframe() # Write recipe outputs output_folder = dataiku.Folder(managed_folder_id) output_folder.upload_stream("filename" + date + ".csv", df.to_csv(index=False).encode("utf-8"))
-
Thanks a lot, I think first solution would be feasible.