Export file into SFTP - using Python

Options
Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭
edited July 2024 in Using Dataiku

Hi All,

I am new to Dataiku and I have an issue exporting datasets as csv into our vm (SFTP) server.

here are the details:

I have a dataset in dataiku called 'a1' and I used Python recipe and created managed folder called 'b1'. Also In the SFTP server the path is "/app/testing/dataiku_test"

here is the script that I am using in the Python recipe :

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
b1= dataiku.Dataset("b1")
b1_df = b1.get_dataframe()


folder_name=dataiku.Folder("/app/testing/dataiku_test")
folder_name_path=folder_name.get_path(ignore_flow=True)


out_folder = dataiku.Folder("/app/testing/dataiku_test")
filename = "some_file_name.csv"
data = b1_df.to_csv(index=False)
out_folder.upload_data(filename, data)

However I am getting the following error:

Job failed: Error in Python process: At line 12: <type 'exceptions.Exception'>: Managed folder /app/testing/dataiku_test cannot be used : declare it as input or output of your recipe

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 2024 Answer ✓

    Hello @pafj
    ,

    Thank you for the post on Community.

    The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.

    Here is sample code to upload a csv file into a managed folder in Python 3.

    import dataiku
    
    DATASET_NAME = 'YOUR_DATASET_NAME'
    FOLDER_NAME = 'YOUR_FOLDER_NAME'
    FILE_NAME = 'YOUR_FILE_NAME'
    
    dataset = dataiku.Dataset(DATASET_NAME)
    df = dataset.get_dataframe()
    
    folder = dataiku.Folder(FOLDER_NAME)
    folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))

    I hope this would help. Please let us know if you have any further questions.

    Sincerely,
    Keiji, Dataiku Technical Support

Answers

  • Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭

    Hi Keiji,

    Your solution worked thank you!

    May I ask you if you can give me a python sample to import files from SFTP into dataiku?

    Thanks a lot

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 2024

    Hello @pafj
    ,

    Thank you for the confirmation.

    Here is sample Python3 code for loading a CSV file from a DSS managed folder into a pandas dataframe.

    import dataiku
    import pandas as pd
    
    FOLDER_NAME = 'YOUR_FOLDER_NAME'
    FILE_NAME = 'YOUR_FILE_NAME'
    
    folder = dataiku.Folder(FOLDER_NAME)
    with folder.get_download_stream(FILE_NAME) as f:
        df = pd.read_csv(f)

  • Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭
    edited July 2024

    Hi @KeijiY

    I wanted to ask a similar question regarding exporting an excel file into the SFTP or SharePoint drive:

    I am using the following python script, and the excel file gets created in the folder however file is 0 kb and wont be opened. Do you believe I am missing anything in the script? i changed my python to 3.6 in the cod environment.

    import dataiku
    import pandas as pd
    import xlsxwriter
    
    DATASET_NAME3 = 'aa'
    FOLDER_NAME3 = 'bb'
    
    
    dataset = dataiku.Dataset(DATASET_NAME3)
    df = dataset.get_dataframe()
    
    import time
    current_day = time.strftime("%Y-%m-%d")
    
    FILE_NAME3 = "aa_"+current_day+".xlsx"
    
    folder = dataiku.Folder(FOLDER_NAME3)
    
    writer = pd.ExcelWriter(folder,engine='xlsxwriter')
    folder.upload_data(FILE_NAME3, df.to_excel(writer,index=False, sheet_name='Sheet1'))

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 2024

    Hello @pafj
    ,

    Thanks for another question.

    Would you please try the following code?

    import dataiku
    import pandas as pd
    import xlsxwriter
    from io import BytesIO
    
    DATASET_NAME3 = 'aa'
    FOLDER_NAME3 = 'bb'
    
    dataset = dataiku.Dataset(DATASET_NAME3)
    df = dataset.get_dataframe()
    
    import time
    current_day = time.strftime("%Y-%m-%d")
    
    FILE_NAME3 = "aa_"+current_day+".xlsx"
    
    folder = dataiku.Folder(FOLDER_NAME3)
    
    stream = BytesIO()
    
    writer = pd.ExcelWriter(stream, engine='xlsxwriter')
    df.to_excel(writer, index=False, sheet_name='Sheet1')
    writer.save()
    stream.seek(0)
    
    folder.upload_stream(FILE_NAME3, stream)

    Sincerely,
    Keiji, Dataiku Technical Support

  • Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭

    It worked. Thank you! you are the best!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.