Unable to write to dataset - TypeError: 'ObjectBlock' object is not iterable

emher
emher Registered Posts: 32 ✭✭✭✭✭
edited July 16 in Using Dataiku

When i try to write a pandas dataframe to a dataset, i.e.

df = pd.DataFrame(...)
ds = dataiku.Dataset(...)
ds.write_with_schema(df, dropAndCreate=True)

i get the following error,

TypeError: 'ObjectBlock' object is not iterable

Have anyone tried something similar? And/or do you know what might be going wrong? Inspecting the dataset, I can see that the schema is written as intended, but no data is written.

EDIT: The error occurs only when I call the code from outside dataiku, if I create a recipe inside dataiku, it works as intended.

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    Hi,

    if it works in DSS, then maybe the cause is a discrepancy in package versions, notably of Pandas and/or Numpy. Can you get a `pip list` of the python environment where you get this error. Also, what's the full stacktrace of the error? (ie where is this error raised from)

  • emher
    emher Registered Posts: 32 ✭✭✭✭✭

    The full stack trace is,

    Traceback (most recent call last): File "/home/emher/Projects/pipeline_test/PFC_python_library/tmp.py", line 35, in <module> pfc.delivery.deliver(df_ecmwf, df_pfc, row.to_dict()) File "/home/emher/Projects/pipeline_test/PFC_python_library/pfc/delivery.py", line 330, in deliver log_handler(meta_data, delivery_blob, error) File "/home/emher/Projects/pipeline_test/PFC_python_library/tmp.py", line 31, in <lambda> pfc.delivery.log_handler = lambda x, y, z: log_to_dataset(x, y, z, dataset=dataiku.Dataset("delivery_logs")) File "/home/emher/Projects/pipeline_test/PFC_python_library/pfc/delivery.py", line 402, in log_to_dataset writer.write_dataframe(df_log) File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dataset_write.py", line 395, in write_dataframe dku_pandas_csv.DKUCSVFormatter(df, self.remote_writer, File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dku_pandas_csv.py", line 197, in save self._save() File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dku_pandas_csv.py", line 296, in _save self._save_chunk(start_i, end_i) File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dku_pandas_csv.py", line 318, in _save_chunk for col_loc, col in zip(b.mgr_locs, d): TypeError: 'ObjectBlock' object is not iterable 0 rows successfully written (NBRAVMN4TD)
    Hence the error arises within dku_pandas_csv. Locally, my numpy and pandas versions are,
    numpy==1.20.0
    pandas==1.2.1
    I am not sure what version dataiku uses internally. Do you know where i can see this?
  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    Hi,

    you're indeed using a very recent Pandas, and python 3.7. DSS' python code actually doesn't handle it yet, so you should revert your pandas to pandas>=1.0,<1.1, and possibly use a python3.6 if that doesn't solve the error

Setup Info
    Tags
      Help me…