Export file into SFTP - using Python

pafj
pafj Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭
edited July 16 in Using Dataiku

Hi All,

I am new to Dataiku and I have an issue exporting datasets as csv into our vm (SFTP) server.

here are the details:

I have a dataset in dataiku called 'a1' and I used Python recipe and created managed folder called 'b1'. Also In the SFTP server the path is "/app/testing/dataiku_test"

here is the script that I am using in the Python recipe :

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
b1= dataiku.Dataset("b1")
b1_df = b1.get_dataframe()


folder_name=dataiku.Folder("/app/testing/dataiku_test")
folder_name_path=folder_name.get_path(ignore_flow=True)


out_folder = dataiku.Folder("/app/testing/dataiku_test")
filename = "some_file_name.csv"
data = b1_df.to_csv(index=False)
out_folder.upload_data(filename, data)

However I am getting the following error:

Job failed: Error in Python process: At line 12: <type 'exceptions.Exception'>: Managed folder /app/testing/dataiku_test cannot be used : declare it as input or output of your recipe

Best Answer

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 17 Answer ✓

    Hello @pafj
    ,

    Thank you for the post on Community.

    The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.

    Here is sample code to upload a csv file into a managed folder in Python 3.

    import dataiku
    
    DATASET_NAME = 'YOUR_DATASET_NAME'
    FOLDER_NAME = 'YOUR_FOLDER_NAME'
    FILE_NAME = 'YOUR_FILE_NAME'
    
    dataset = dataiku.Dataset(DATASET_NAME)
    df = dataset.get_dataframe()
    
    folder = dataiku.Folder(FOLDER_NAME)
    folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))

    I hope this would help. Please let us know if you have any further questions.

    Sincerely,
    Keiji, Dataiku Technical Support

Answers

  • pafj
    pafj Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭

    Hi Keiji,

    Your solution worked thank you!

    May I ask you if you can give me a python sample to import files from SFTP into dataiku?

    Thanks a lot

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 17

    Hello @pafj
    ,

    Thank you for the confirmation.

    Here is sample Python3 code for loading a CSV file from a DSS managed folder into a pandas dataframe.

    import dataiku
    import pandas as pd
    
    FOLDER_NAME = 'YOUR_FOLDER_NAME'
    FILE_NAME = 'YOUR_FILE_NAME'
    
    folder = dataiku.Folder(FOLDER_NAME)
    with folder.get_download_stream(FILE_NAME) as f:
        df = pd.read_csv(f)

  • pafj
    pafj Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭
    edited July 17

    Hi @KeijiY

    I wanted to ask a similar question regarding exporting an excel file into the SFTP or SharePoint drive:

    I am using the following python script, and the excel file gets created in the folder however file is 0 kb and wont be opened. Do you believe I am missing anything in the script? i changed my python to 3.6 in the cod environment.

    import dataiku
    import pandas as pd
    import xlsxwriter
    
    DATASET_NAME3 = 'aa'
    FOLDER_NAME3 = 'bb'
    
    
    dataset = dataiku.Dataset(DATASET_NAME3)
    df = dataset.get_dataframe()
    
    import time
    current_day = time.strftime("%Y-%m-%d")
    
    FILE_NAME3 = "aa_"+current_day+".xlsx"
    
    folder = dataiku.Folder(FOLDER_NAME3)
    
    writer = pd.ExcelWriter(folder,engine='xlsxwriter')
    folder.upload_data(FILE_NAME3, df.to_excel(writer,index=False, sheet_name='Sheet1'))

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    edited July 17

    Hello @pafj
    ,

    Thanks for another question.

    Would you please try the following code?

    import dataiku
    import pandas as pd
    import xlsxwriter
    from io import BytesIO
    
    DATASET_NAME3 = 'aa'
    FOLDER_NAME3 = 'bb'
    
    dataset = dataiku.Dataset(DATASET_NAME3)
    df = dataset.get_dataframe()
    
    import time
    current_day = time.strftime("%Y-%m-%d")
    
    FILE_NAME3 = "aa_"+current_day+".xlsx"
    
    folder = dataiku.Folder(FOLDER_NAME3)
    
    stream = BytesIO()
    
    writer = pd.ExcelWriter(stream, engine='xlsxwriter')
    df.to_excel(writer, index=False, sheet_name='Sheet1')
    writer.save()
    stream.seek(0)
    
    folder.upload_stream(FILE_NAME3, stream)

    Sincerely,
    Keiji, Dataiku Technical Support

  • pafj
    pafj Dataiku DSS Core Designer, Registered Posts: 13 ✭✭✭

    It worked. Thank you! you are the best!

Setup Info
    Tags
      Help me…