Export file into SFTP - using Python
Hi All,
I am new to Dataiku and I have an issue exporting datasets as csv into our vm (SFTP) server.
here are the details:
I have a dataset in dataiku called 'a1' and I used Python recipe and created managed folder called 'b1'. Also In the SFTP server the path is "/app/testing/dataiku_test"
here is the script that I am using in the Python recipe :
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs b1= dataiku.Dataset("b1") b1_df = b1.get_dataframe() folder_name=dataiku.Folder("/app/testing/dataiku_test") folder_name_path=folder_name.get_path(ignore_flow=True) out_folder = dataiku.Folder("/app/testing/dataiku_test") filename = "some_file_name.csv" data = b1_df.to_csv(index=False) out_folder.upload_data(filename, data)
However I am getting the following error:
Job failed: Error in Python process: At line 12: <type 'exceptions.Exception'>: Managed folder /app/testing/dataiku_test cannot be used : declare it as input or output of your recipe
Best Answer
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @pafj
,Thank you for the post on Community.
The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.
Here is sample code to upload a csv file into a managed folder in Python 3.
import dataiku DATASET_NAME = 'YOUR_DATASET_NAME' FOLDER_NAME = 'YOUR_FOLDER_NAME' FILE_NAME = 'YOUR_FILE_NAME' dataset = dataiku.Dataset(DATASET_NAME) df = dataset.get_dataframe() folder = dataiku.Folder(FOLDER_NAME) folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))
I hope this would help. Please let us know if you have any further questions.
Sincerely,
Keiji, Dataiku Technical Support
Answers
-
Hi Keiji,
Your solution worked thank you!
May I ask you if you can give me a python sample to import files from SFTP into dataiku?
Thanks a lot
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @pafj
,Thank you for the confirmation.
Here is sample Python3 code for loading a CSV file from a DSS managed folder into a pandas dataframe.
import dataiku import pandas as pd FOLDER_NAME = 'YOUR_FOLDER_NAME' FILE_NAME = 'YOUR_FILE_NAME' folder = dataiku.Folder(FOLDER_NAME) with folder.get_download_stream(FILE_NAME) as f: df = pd.read_csv(f)
-
Hi @KeijiY
I wanted to ask a similar question regarding exporting an excel file into the SFTP or SharePoint drive:
I am using the following python script, and the excel file gets created in the folder however file is 0 kb and wont be opened. Do you believe I am missing anything in the script? i changed my python to 3.6 in the cod environment.
import dataiku import pandas as pd import xlsxwriter DATASET_NAME3 = 'aa' FOLDER_NAME3 = 'bb' dataset = dataiku.Dataset(DATASET_NAME3) df = dataset.get_dataframe() import time current_day = time.strftime("%Y-%m-%d") FILE_NAME3 = "aa_"+current_day+".xlsx" folder = dataiku.Folder(FOLDER_NAME3) writer = pd.ExcelWriter(folder,engine='xlsxwriter') folder.upload_data(FILE_NAME3, df.to_excel(writer,index=False, sheet_name='Sheet1'))
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @pafj
,Thanks for another question.
Would you please try the following code?
import dataiku import pandas as pd import xlsxwriter from io import BytesIO DATASET_NAME3 = 'aa' FOLDER_NAME3 = 'bb' dataset = dataiku.Dataset(DATASET_NAME3) df = dataset.get_dataframe() import time current_day = time.strftime("%Y-%m-%d") FILE_NAME3 = "aa_"+current_day+".xlsx" folder = dataiku.Folder(FOLDER_NAME3) stream = BytesIO() writer = pd.ExcelWriter(stream, engine='xlsxwriter') df.to_excel(writer, index=False, sheet_name='Sheet1') writer.save() stream.seek(0) folder.upload_stream(FILE_NAME3, stream)
Sincerely,
Keiji, Dataiku Technical Support -
It worked. Thank you! you are the best!