Taking Snowflake dataset to pipe delimited txt file on SFTP
I'm looking to take a Snowflake dataset (that I'm generating/refreshing using a scenario step) and put it into a pipe delimited txt file that I'll drop onto SFTP. Having some trouble sorting out the steps I need to take to get there. From what I've gathered, I need to do the following
1) Drop Snowflake dataset (generated by scenario) in my flow - Completed
2) Use 'Export to Folder' visual recipe to drop dataset there - Not Completed
3) Use Python visual recipe to use the data in the managed folder to output this to SFTP - Not Completed
Do these steps seem correct?
I've been leveraging a few prior solutions (https://community.dataiku.com/t5/Using-Dataiku/Create-a-file-in-sftp/m-p/8967) but still a bit stuck.
Operating system used: Windows
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @ccecil
,"Export to folder" recipe does not support using a pipe-delimited output file to the managed folder.
Supported format are :So indeed you would need to use a Python recipe as you suggested. To write directly to an SFTP folder you can create a SFTP managed folder, first add the SFTP connection and then choose the output of your Python recipe to the respective managed folder/path. You may also opt to use directly a python library e.g pysftp/paramiko depending on your exact needs and sftp server.
The Python recipe would need to simply read and write out the using pandas and the delimited ""|".import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu from datetime import datetime #replace with your managed folder managed_folder_id = "URKU7Oqb" input_dataset_name = "customers_labeled_prepared" # Read dataset my_dataset = dataiku.Dataset(input_dataset_name) #if the dataset can't fit into memory you may opt to use chuncked reading/writting https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-data.html#chunked-reading-and-writing-with-pandas df = my_dataset.get_dataframe() # Write recipe outputs output_folder = dataiku.Folder(managed_folder_id) output_folder.upload_stream("filename.txt , df.to_csv(index=False, sep='|').encode("utf-8"))
Answers
-
Thanks, @AlexT
. I leveraged code that almost mirrors what you provided to get this done. I appreciate it!