Save DataFrame to a managed folder
galapah
Registered Posts: 2 ✭✭✭
I am trying to save a pandas DataFrame to a managed folder in Dataiku.
My code:
import dataiku import pandas as pd temp_folder = "reports_TEMP" path_upload_file = "testfile.csv" df = pd.DataFrame(range(0,10), columns=["test"]) handle = dataiku.Folder(temp_folder) with handle.get_writer(path_upload_file) as w: df.to_csv(w)
and this is the error that I get:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-135-e90a7150097c> in <module> 8 handle = dataiku.Folder(temp_folder) 9 with handle.get_writer(path_upload_file) as w: ---> 10 df.to_csv(w) /data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal) 3200 doublequote=doublequote, 3201 escapechar=escapechar, -> 3202 decimal=decimal, 3203 ) 3204 formatter.save() /data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/io/formats/csvs.py in __init__(self, obj, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, encoding, compression, quoting, line_terminator, chunksize, quotechar, date_format, doublequote, escapechar, decimal) 64 65 self.path_or_buf, _, _, self.should_close = get_filepath_or_buffer( ---> 66 path_or_buf, encoding=encoding, compression=compression, mode=mode 67 ) 68 self.sep = sep /data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode) 198 if not is_file_like(filepath_or_buffer): 199 msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}" --> 200 raise ValueError(msg) 201 202 return filepath_or_buffer, None, compression, False ValueError: Invalid file path or buffer object type: <class 'dataiku.core.managed_folder.ManagedFolderWriter'>
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
You can use the Export to Folder recipe to export a DSS dataset to a managed folder.
If you are looking at this via code you can try using
https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.upload_data The following sample worked fine for me :
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs base_64 = dataiku.Dataset("base_64") df = base_64.get_dataframe() managed_folder_id = "output" output_folder = dataiku.Folder(managed_folder_id) filename = "my_file.csv" output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Additionally, if you are looking to actually use get_writer() you can use it as such :
import dataiku import pandas as pd temp_folder = "output" path_upload_file = "chunck_written.csv" input_dataset = dataiku.Dataset("dataset_name") handle = dataiku.Folder(temp_folder) df = input_dataset.get_dataframe() with handle.get_writer(path_upload_file) as w: w.write(df.to_csv().encode('utf-8'))
-
Thank you, Alex!
I need the second solution - just to test writing into a managed folder for another task.
It works!