File from SFTP to Dataiku csv/excel on daily basis.
vinayk
Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner
Hello Community,
I need a help in regards to one scenario, where in we will be updating the csv/excel file in SFTP server and that updated file has to uploaded to Dataiku on daily basis, what is the way to achieve this.
Any help would be much appreciated.
Thank you.
Best Answer
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @vinayk
,OK, so, three options:
- If you have a "Files-based" SCP/SFTP connection to the machine where this file lives in, create a new dataset using this connection and the "New SFTP dataset" option.
- Be sure to show the "advanced options" and explicitly select the file you want to use as the source of your new dataset.
- To be sure that the file gets to be updated on daily basis, I'd create a Sync recipe, and then be sure to make it run in your scenario.
- If your instance doesn't have a "SCP/SFTP" connection configured, but you have the url and credentials, use the "Download Recipe" to get the files from the remote server, with the "Download from HTTP or FTP URL".
- This will download the files into a managed folder, from where you will need to create a new dataset using the downloaded file
- Use a python recipe to read the csv/excel file from the remote server, use pandas to convert it into a dataframe, and store it in a dataset. Using the paramiko library, this could be done like this:
import dataiku import paramiko import pandas as pd client_sftp = paramiko.SSHClient() client_sftp.set_missing_host_key_policy(paramiko.AutoAddPolicy()) client_sftp.connect('host.name', username='user', password='password') sftp = client_sftp.open_sftp() # if the file is csv: tmp_df = pd.read_csv(sftp.open('path_to_the_csv_file/filename.csv')) result_dataset = dataiku.Dataset("result_dataset") result_dataset.write_with_schema(tmp_df)
Hope this helps!
- If you have a "Files-based" SCP/SFTP connection to the machine where this file lives in, create a new dataset using this connection and the "New SFTP dataset" option.
Answers
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @vinayk
,Is it a single file or multiple files? Does the name change?
Using the download recipe might be a good option, but somehow I do prefer to use a python recipe to do the trick.
While you give us some more details, maybe you can also find useful information in this post:
Cheers!
-
Thank you @Ignacio_Toledo
, that is a single file and name remains same. -
Thank you @Ignacio_Toledo
, this helps.