Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi All,
I am new to Dataiku and I have an issue exporting datasets as csv into our vm (SFTP) server.
here are the details:
I have a dataset in dataiku called 'a1' and I used Python recipe and created managed folder called 'b1'. Also In the SFTP server the path is "/app/testing/dataiku_test"
here is the script that I am using in the Python recipe :
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
b1= dataiku.Dataset("b1")
b1_df = b1.get_dataframe()
folder_name=dataiku.Folder("/app/testing/dataiku_test")
folder_name_path=folder_name.get_path(ignore_flow=True)
out_folder = dataiku.Folder("/app/testing/dataiku_test")
filename = "some_file_name.csv"
data = b1_df.to_csv(index=False)
out_folder.upload_data(filename, data)
However I am getting the following error:
Hello @pafj ,
Thank you for the post on Community.
The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.
Here is sample code to upload a csv file into a managed folder in Python 3.
import dataiku
DATASET_NAME = 'YOUR_DATASET_NAME'
FOLDER_NAME = 'YOUR_FOLDER_NAME'
FILE_NAME = 'YOUR_FILE_NAME'
dataset = dataiku.Dataset(DATASET_NAME)
df = dataset.get_dataframe()
folder = dataiku.Folder(FOLDER_NAME)
folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))
I hope this would help. Please let us know if you have any further questions.
Sincerely,
Keiji, Dataiku Technical Support
Hello @pafj ,
Thank you for the post on Community.
The parameter of `dataiku.Folder` should be the id or the name of your managed folder, not the underlying path of the folder. If your managed folder's name is 'b1', you can call `dataiku.Folder('b1')` to create an instance of the managed folder in Python.
Here is sample code to upload a csv file into a managed folder in Python 3.
import dataiku
DATASET_NAME = 'YOUR_DATASET_NAME'
FOLDER_NAME = 'YOUR_FOLDER_NAME'
FILE_NAME = 'YOUR_FILE_NAME'
dataset = dataiku.Dataset(DATASET_NAME)
df = dataset.get_dataframe()
folder = dataiku.Folder(FOLDER_NAME)
folder.upload_data(FILE_NAME, df.to_csv(index=False).encode('utf8'))
I hope this would help. Please let us know if you have any further questions.
Sincerely,
Keiji, Dataiku Technical Support
Hi Keiji,
Your solution worked thank you!
May I ask you if you can give me a python sample to import files from SFTP into dataiku?
Thanks a lot
Hello @pafj,
Thank you for the confirmation.
Here is sample Python3 code for loading a CSV file from a DSS managed folder into a pandas dataframe.
import dataiku
import pandas as pd
FOLDER_NAME = 'YOUR_FOLDER_NAME'
FILE_NAME = 'YOUR_FILE_NAME'
folder = dataiku.Folder(FOLDER_NAME)
with folder.get_download_stream(FILE_NAME) as f:
df = pd.read_csv(f)
Hi @KeijiY
I wanted to ask a similar question regarding exporting an excel file into the SFTP or SharePoint drive:
I am using the following python script, and the excel file gets created in the folder however file is 0 kb and wont be opened. Do you believe I am missing anything in the script? i changed my python to 3.6 in the cod environment.
import dataiku
import pandas as pd
import xlsxwriter
DATASET_NAME3 = 'aa'
FOLDER_NAME3 = 'bb'
dataset = dataiku.Dataset(DATASET_NAME3)
df = dataset.get_dataframe()
import time
current_day = time.strftime("%Y-%m-%d")
FILE_NAME3 = "aa_"+current_day+".xlsx"
folder = dataiku.Folder(FOLDER_NAME3)
writer = pd.ExcelWriter(folder,engine='xlsxwriter')
folder.upload_data(FILE_NAME3, df.to_excel(writer,index=False, sheet_name='Sheet1'))
Hello @pafj ,
Thanks for another question.
Would you please try the following code?
import dataiku
import pandas as pd
import xlsxwriter
from io import BytesIO
DATASET_NAME3 = 'aa'
FOLDER_NAME3 = 'bb'
dataset = dataiku.Dataset(DATASET_NAME3)
df = dataset.get_dataframe()
import time
current_day = time.strftime("%Y-%m-%d")
FILE_NAME3 = "aa_"+current_day+".xlsx"
folder = dataiku.Folder(FOLDER_NAME3)
stream = BytesIO()
writer = pd.ExcelWriter(stream, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name='Sheet1')
writer.save()
stream.seek(0)
folder.upload_stream(FILE_NAME3, stream)
Sincerely,
Keiji, Dataiku Technical Support
It worked. Thank you! you are the best!