Export file into SFTP - using Python

Solved!
pafj
Level 3
Export file into SFTP - using Python

Hi All,

I am new to Dataiku and I have an issue exporting datasets as csv into our vm (SFTP) server.

here are the details:

I have a dataset in dataiku called 'a1' and I used Python recipe and created managed folder called 'b1'. Also In the SFTP server the path is "/app/testing/dataiku_test"

here is the script that I am using in the Python recipe :

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
b1= dataiku.Dataset("b1")
b1_df = b1.get_dataframe()


folder_name=dataiku.Folder("/app/testing/dataiku_test")
folder_name_path=folder_name.get_path(ignore_flow=True)


out_folder = dataiku.Folder("/app/testing/dataiku_test")
filename = "some_file_name.csv"
data = b1_df.to_csv(index=False)
out_folder.upload_data(filename, data)

 

However I am getting the following error:

Job failed: Error in Python process: At line 12: <type 'exceptions.Exception'>: Managed folder /app/testing/dataiku_test cannot be used : declare it as input or output of your recipe

0 Kudos
1 Solution
KeijiY
Dataiker

Hello @pafj ,

Thank you for the post on Community.

The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.

Here is sample code to upload a csv file into a managed folder in Python 3.

 

import dataiku

DATASET_NAME = 'YOUR_DATASET_NAME'
FOLDER_NAME = 'YOUR_FOLDER_NAME'
FILE_NAME = 'YOUR_FILE_NAME'

dataset = dataiku.Dataset(DATASET_NAME)
df = dataset.get_dataframe()

folder = dataiku.Folder(FOLDER_NAME)
folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))

 

I hope this would help. Please let us know if you have any further questions.

Sincerely,
Keiji, Dataiku Technical Support

View solution in original post

6 Replies
KeijiY
Dataiker

Hello @pafj ,

Thank you for the post on Community.

The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.

Here is sample code to upload a csv file into a managed folder in Python 3.

 

import dataiku

DATASET_NAME = 'YOUR_DATASET_NAME'
FOLDER_NAME = 'YOUR_FOLDER_NAME'
FILE_NAME = 'YOUR_FILE_NAME'

dataset = dataiku.Dataset(DATASET_NAME)
df = dataset.get_dataframe()

folder = dataiku.Folder(FOLDER_NAME)
folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))

 

I hope this would help. Please let us know if you have any further questions.

Sincerely,
Keiji, Dataiku Technical Support

pafj
Level 3
Author

Hi Keiji,

Your solution worked thank you!

May I ask you if you can give me a python sample to import files from SFTP into dataiku?

Thanks a lot

0 Kudos
KeijiY
Dataiker

Hello @pafj,

Thank you for the confirmation.

Here is sample Python3 code for loading a CSV file from a DSS managed folder into a pandas dataframe.

 

import dataiku
import pandas as pd

FOLDER_NAME = 'YOUR_FOLDER_NAME'
FILE_NAME = 'YOUR_FILE_NAME'

folder = dataiku.Folder(FOLDER_NAME)
with folder.get_download_stream(FILE_NAME) as f:
    df = pd.read_csv(f)

 

pafj
Level 3
Author

Hi @KeijiY 

I wanted to ask a similar question regarding exporting an excel file into the SFTP or SharePoint drive:

I am using the following python script, and the excel file gets created in the folder however file is 0 kb and wont be opened. Do you believe I am missing anything in the script? i changed my python to 3.6 in the cod environment.

import dataiku
import pandas as pd
import xlsxwriter

DATASET_NAME3 = 'aa'
FOLDER_NAME3 = 'bb'


dataset = dataiku.Dataset(DATASET_NAME3)
df = dataset.get_dataframe()

import time
current_day = time.strftime("%Y-%m-%d")

FILE_NAME3 = "aa_"+current_day+".xlsx"

folder = dataiku.Folder(FOLDER_NAME3)

writer = pd.ExcelWriter(folder,engine='xlsxwriter')
folder.upload_data(FILE_NAME3, df.to_excel(writer,index=False, sheet_name='Sheet1'))

 

 

0 Kudos
KeijiY
Dataiker

Hello @pafj ,

Thanks for another question.

Would you please try the following code?

import dataiku
import pandas as pd
import xlsxwriter
from io import BytesIO

DATASET_NAME3 = 'aa'
FOLDER_NAME3 = 'bb'

dataset = dataiku.Dataset(DATASET_NAME3)
df = dataset.get_dataframe()

import time
current_day = time.strftime("%Y-%m-%d")

FILE_NAME3 = "aa_"+current_day+".xlsx"

folder = dataiku.Folder(FOLDER_NAME3)

stream = BytesIO()

writer = pd.ExcelWriter(stream, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name='Sheet1')
writer.save()
stream.seek(0)

folder.upload_stream(FILE_NAME3, stream)

 

Sincerely,
Keiji, Dataiku Technical Support

pafj
Level 3
Author

It worked. Thank you! you are the best!