SFTP - Placing file in desired directory without subfolders

MossandRoy
MossandRoy Dataiku DSS Core Designer, Registered Posts: 8

I have a business requirement to create a dataset and place it onto an SFTP server. I'm very close to meeting the requirement, except I am struggling to place the file where it is needed.

The requirement is to place the file into a specific folder, which I've entered as the root of my SFTP connection using the optional "Path from" parameter. Using a managed folder, I'm able to place the file onto "root\projectcode\folderid\filename.csv", but what I'd like to do is get the file on "root\filename.csv".

How can I go about manipulating my project to achieve this desired outcome in an automated fashion? I have tried using a managed dataset on the SFTP connection, but that came with a subfolder structure of its own. Thank you in advance for your help!

Tagged:

Best Answer

  • MossandRoy
    MossandRoy Dataiku DSS Core Designer, Registered Posts: 8
    edited July 2024 Answer ✓

    I was able to solve my problem, at least it seems so. I am using a managed folder where the path is to the directory I want the file in. I'm then using a simple python recipe to create the file that I need. I'll provide a sample, below.

    It seems like there should be a way to do this with a visual recipe, but I've not figured out how. Fortunately, this works.

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    my_dataset = dataiku.Dataset("DatasetName")
    my_dataset_df = my_dataset.get_dataframe()
    
    # Set recipe outputs
    my_output = dataiku.Folder("FolderID")
    my_output_info = my_output.get_info() #not sure this is needed but it's part of the default recipe
    
    filename = 'desiredfilename.csv'
    
    my_output.upload_data(filename, my_dataset_df.to_csv(index=False).encode("utf-8"))

Answers

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker

    Hi @MossandRoy
    ,

    I think the issue you're running into is that you have the "Path from" connection parameter set directly to the folder that you're trying to create the dataset in. This is causing DSS to see your folder as the root of the connection, which is causing issues because writing directly to the root of a connection isn't recommended.

    Instead, I recommend either unsetting the "Path from" parameter, or setting it to a directory that's at least 1 level above your folder. For example, if you want to create a dataset in the directory "/root/dataset/", then set the "Path from" parameter to "/root".

    Additionally, from your description, it's not clear if you're trying to create a managed dataset, or if you're trying to create a managed folder.

    • If you want to create a dataset on your SFTP connection that's the output of another recipe in your Flow (such as the output of a Prepare recipe), then you can use a managed dataset.
    • If you want to copy an existing CSV file to your SFTP connection, then you can use a managed folder.

    Regardless of whether you decide to go with a managed folder or a managed dataset, you can set the path of the folder/dataset in the connection settings, as shown in the below screenshots.

    Managed folder:

    723337F1-7C49-40A6-BE29-654253DD0005.png

    Managed dataset:

    4716E3DF-6D1B-44ED-8DFF-A5B02DBB445A.png

    Thanks,

    Zach

  • MossandRoy
    MossandRoy Dataiku DSS Core Designer, Registered Posts: 8

    Hi @ZachM
    and thanks for your reply!

    I'd like to create a file of my chosen name in my chosen directory. I've tried searching the Dataiku Academy but couldn't find a lesson on SSH/SFTP, because it seems like that's what I really need right now.

    I took your suggestion and took a directory off of my connection path and then I used that directory to set the path of my managed dataset. Running the parent recipe appears to have wiped out the other folders in that directory and created a .csv.gz file. Perhaps this is expected behavior, but it doesn't make sense to me.

    Do I need to use python to achieve my desired result in this instance? My requirement to deposit a specific file to a specific directory does not seem unique, and I know someone must have done this before me. Thanks for your help!

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker

    Hi @MossandRoy
    ,

    Sorry, I should have specified. Managed datasets are meant to be set to an empty directory by themselves. It will delete anything else that exists in the directory because it assumes that any existing files are just old versions of the data. This is expected behavior.

    For your use case of saving the dataset as a CSV file to a folder and ignoring existing files, I think an "Export to folder" recipe would work:

    39FDF26B-E319-40BB-AC02-4106E8A234F9_1_201_a.jpeg

    The name of the exported file will be set to the name of the dataset.

Setup Info
    Tags
      Help me…