Save DataFrame to a managed folder

galapah
galapah Registered Posts: 2 ✭✭✭
edited July 16 in Using Dataiku

I am trying to save a pandas DataFrame to a managed folder in Dataiku.

My code:

import dataiku
import pandas as pd

temp_folder = "reports_TEMP"
path_upload_file = "testfile.csv"
df = pd.DataFrame(range(0,10), columns=["test"])

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
    df.to_csv(w)

and this is the error that I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-135-e90a7150097c> in <module>
      8 handle = dataiku.Folder(temp_folder)
      9 with handle.get_writer(path_upload_file) as w:
---> 10     df.to_csv(w)

/data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal)
   3200             doublequote=doublequote,
   3201             escapechar=escapechar,
-> 3202             decimal=decimal,
   3203         )
   3204         formatter.save()

/data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/io/formats/csvs.py in __init__(self, obj, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, encoding, compression, quoting, line_terminator, chunksize, quotechar, date_format, doublequote, escapechar, decimal)
     64 
     65         self.path_or_buf, _, _, self.should_close = get_filepath_or_buffer(
---> 66             path_or_buf, encoding=encoding, compression=compression, mode=mode
     67         )
     68         self.sep = sep

/data/dataiku/dataiku-dss-6.0.1/dss_data/code-envs/python/Py_36_flight_risk/lib/python3.6/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
    198     if not is_file_like(filepath_or_buffer):
    199         msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 200         raise ValueError(msg)
    201 
    202     return filepath_or_buffer, None, compression, False

ValueError: Invalid file path or buffer object type: <class 'dataiku.core.managed_folder.ManagedFolderWriter'>

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    edited July 17

    Hi,

    You can use the Export to Folder recipe to export a DSS dataset to a managed folder.

    Screenshot 2021-09-23 at 17.57.19.png

    If you are looking at this via code you can try using

    https://doc.dataiku.com/dss/latest/python-api/managed_folders.html#dataiku.Folder.upload_data The following sample worked fine for me :

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    base_64 = dataiku.Dataset("base_64")
    df = base_64.get_dataframe()
    
    managed_folder_id = "output"
    output_folder = dataiku.Folder(managed_folder_id)
    filename = "my_file.csv"
    output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    edited July 17

    Additionally, if you are looking to actually use get_writer() you can use it as such :

    import dataiku
    import pandas as pd
    
    temp_folder = "output"
    path_upload_file = "chunck_written.csv"
    input_dataset = dataiku.Dataset("dataset_name")
    
    handle = dataiku.Folder(temp_folder)
    
    df = input_dataset.get_dataframe()
    
    with handle.get_writer(path_upload_file) as w:
        w.write(df.to_csv().encode('utf-8'))

  • galapah
    galapah Registered Posts: 2 ✭✭✭

    Thank you, Alex!

    I need the second solution - just to test writing into a managed folder for another task.

    It works!

Setup Info
    Tags
      Help me…