Taking Snowflake dataset to pipe delimited txt file on SFTP

Options
ccecil
ccecil Registered Posts: 17 ✭✭✭✭

I'm looking to take a Snowflake dataset (that I'm generating/refreshing using a scenario step) and put it into a pipe delimited txt file that I'll drop onto SFTP. Having some trouble sorting out the steps I need to take to get there. From what I've gathered, I need to do the following

1) Drop Snowflake dataset (generated by scenario) in my flow - Completed

2) Use 'Export to Folder' visual recipe to drop dataset there - Not Completed

3) Use Python visual recipe to use the data in the managed folder to output this to SFTP - Not Completed

Do these steps seem correct?

I've been leveraging a few prior solutions (https://community.dataiku.com/t5/Using-Dataiku/Create-a-file-in-sftp/m-p/8967) but still a bit stuck.


Operating system used: Windows

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17 Answer ✓
    Options

    Hi @ccecil
    ,

    "Export to folder" recipe does not support using a pipe-delimited output file to the managed folder.
    Supported format are :
    Screenshot 2023-06-04 at 10.28.24 PM.png

    So indeed you would need to use a Python recipe as you suggested. To write directly to an SFTP folder you can create a SFTP managed folder, first add the SFTP connection and then choose the output of your Python recipe to the respective managed folder/path. You may also opt to use directly a python library e.g pysftp/paramiko depending on your exact needs and sftp server.

    The Python recipe would need to simply read and write out the using pandas and the delimited ""|".

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    from datetime import datetime
    
    #replace with your managed folder
    managed_folder_id = "URKU7Oqb"
    input_dataset_name = "customers_labeled_prepared"
    
    # Read dataset 
    my_dataset = dataiku.Dataset(input_dataset_name)
    #if the dataset can't fit into memory you may opt to use chuncked reading/writting https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-data.html#chunked-reading-and-writing-with-pandas
    
    df = my_dataset.get_dataframe()
    
    # Write recipe outputs
    output_folder = dataiku.Folder(managed_folder_id)
    output_folder.upload_stream("filename.txt , df.to_csv(index=False, sep='|').encode("utf-8"))

Answers

Setup Info
    Tags
      Help me…