Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Save DataFrame to a managed folder

galapah
Level 1
Save DataFrame to a managed folder

I am trying to save a pandas DataFrame to a managed folder in Dataiku.

My code:

import dataiku
import pandas as pd

temp_folder = "reports_TEMP"
path_upload_file = "testfile.csv"
df = pd.DataFrame(range(0,10), columns=["test"])

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
    df.to_csv(w)

and this is the error that I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-135-e90a7150097c> in <module>
      8 handle = dataiku.Folder(temp_folder)
      9 with handle.get_writer(path_upload_file) as w:
---> 10     df.to_csv(w)

/data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal)
   3200             doublequote=doublequote,
   3201             escapechar=escapechar,
-> 3202             decimal=decimal,
   3203         )
   3204         formatter.save()

/data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/io/formats/csvs.py in __init__(self, obj, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, encoding, compression, quoting, line_terminator, chunksize, quotechar, date_format, doublequote, escapechar, decimal)
     64 
     65         self.path_or_buf, _, _, self.should_close = get_filepath_or_buffer(
---> 66             path_or_buf, encoding=encoding, compression=compression, mode=mode
     67         )
     68         self.sep = sep

/data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
    198     if not is_file_like(filepath_or_buffer):
    199         msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 200         raise ValueError(msg)
    201 
    202     return filepath_or_buffer, None, compression, False

ValueError: Invalid file path or buffer object type: <class 'dataiku.core.managed_folder.ManagedFolderWriter'>
0 Kudos
3 Replies
AlexT
Dataiker
Dataiker

Hi,

You can use the Export to Folder recipe to export a DSS dataset to a managed folder. 

Screenshot 2021-09-23 at 17.57.19.png

If you are looking at this via code you can try using

 https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.upload_data  The following sample worked fine for me :

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
base_64 = dataiku.Dataset("base_64")
df = base_64.get_dataframe()

managed_folder_id = "output"
output_folder = dataiku.Folder(managed_folder_id)
filename = "my_file.csv"
output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))

 

0 Kudos
AlexT
Dataiker
Dataiker

Additionally, if you are looking to actually use get_writer() you can use it as such : 

import dataiku
import pandas as pd

temp_folder = "output"
path_upload_file = "chunck_written.csv"
input_dataset = dataiku.Dataset("dataset_name")

handle = dataiku.Folder(temp_folder)

df = input_dataset.get_dataframe()

with handle.get_writer(path_upload_file) as w:
    w.write(df.to_csv().encode('utf-8'))

 

0 Kudos
galapah
Level 1
Author

Thank you, Alex!

I need the second solution - just to test writing into a managed folder for another task.

It works!

0 Kudos