Taking Snowflake dataset to pipe delimited txt file on SFTP

Solved!
ccecil
Level 3
Taking Snowflake dataset to pipe delimited txt file on SFTP

I'm looking to take a Snowflake dataset (that I'm generating/refreshing using a scenario step) and put it into a pipe delimited txt file that I'll drop onto SFTP. Having some trouble sorting out the steps I need to take to get there. From what I've gathered, I need to do the following

1) Drop Snowflake dataset (generated by scenario) in my flow - Completed

2) Use 'Export to Folder' visual recipe to drop dataset there - Not Completed

3) Use Python visual recipe to use the data in the managed folder to output this to SFTP - Not Completed

 

Do these steps seem correct? 

I've been leveraging a few prior solutions (https://community.dataiku.com/t5/Using-Dataiku/Create-a-file-in-sftp/m-p/8967) but still a bit stuck. 


Operating system used: Windows

0 Kudos
1 Solution
AlexT
Dataiker

Hi @ccecil ,

"Export to folder" recipe does not support using a pipe-delimited output file to the managed folder.
Supported format are :
Screenshot 2023-06-04 at 10.28.24 PM.png

So indeed you would need to use a Python recipe as you suggested. To write directly to an SFTP folder you can create a SFTP managed folder, first add the SFTP connection and then choose the output of your Python recipe to the respective managed folder/path. You may also opt to use directly a python library e.g pysftp/paramiko depending on your exact needs and sftp server.

 The Python recipe would need to simply read and write out the using pandas and the delimited ""|".

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime

#replace with your managed folder
managed_folder_id = "URKU7Oqb"
input_dataset_name = "customers_labeled_prepared"

# Read dataset 
my_dataset = dataiku.Dataset(input_dataset_name)
#if the dataset can't fit into memory you may opt to use chuncked reading/writting https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-data.html#chunked-reading-and-writing-with-pandas

df = my_dataset.get_dataframe()

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("filename.txt , df.to_csv(index=False, sep='|').encode("utf-8"))

 

View solution in original post

2 Replies
AlexT
Dataiker

Hi @ccecil ,

"Export to folder" recipe does not support using a pipe-delimited output file to the managed folder.
Supported format are :
Screenshot 2023-06-04 at 10.28.24 PM.png

So indeed you would need to use a Python recipe as you suggested. To write directly to an SFTP folder you can create a SFTP managed folder, first add the SFTP connection and then choose the output of your Python recipe to the respective managed folder/path. You may also opt to use directly a python library e.g pysftp/paramiko depending on your exact needs and sftp server.

 The Python recipe would need to simply read and write out the using pandas and the delimited ""|".

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime

#replace with your managed folder
managed_folder_id = "URKU7Oqb"
input_dataset_name = "customers_labeled_prepared"

# Read dataset 
my_dataset = dataiku.Dataset(input_dataset_name)
#if the dataset can't fit into memory you may opt to use chuncked reading/writting https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-data.html#chunked-reading-and-writing-with-pandas

df = my_dataset.get_dataframe()

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("filename.txt , df.to_csv(index=False, sep='|').encode("utf-8"))

 

ccecil
Level 3
Author

Thanks, @AlexT . I leveraged code that almost mirrors what you provided to get this done. I appreciate it!

0 Kudos