Unable to write to dataset - TypeError: 'ObjectBlock' object is not iterable
When i try to write a pandas dataframe to a dataset, i.e.
df = pd.DataFrame(...) ds = dataiku.Dataset(...) ds.write_with_schema(df, dropAndCreate=True)
i get the following error,
TypeError: 'ObjectBlock' object is not iterable
Have anyone tried something similar? And/or do you know what might be going wrong? Inspecting the dataset, I can see that the schema is written as intended, but no data is written.
EDIT: The error occurs only when I call the code from outside dataiku, if I create a recipe inside dataiku, it works as intended.
Answers
-
Hi,
if it works in DSS, then maybe the cause is a discrepancy in package versions, notably of Pandas and/or Numpy. Can you get a `pip list` of the python environment where you get this error. Also, what's the full stacktrace of the error? (ie where is this error raised from)
-
The full stack trace is,
Traceback (most recent call last): File "/home/emher/Projects/pipeline_test/PFC_python_library/tmp.py", line 35, in <module> pfc.delivery.deliver(df_ecmwf, df_pfc, row.to_dict()) File "/home/emher/Projects/pipeline_test/PFC_python_library/pfc/delivery.py", line 330, in deliver log_handler(meta_data, delivery_blob, error) File "/home/emher/Projects/pipeline_test/PFC_python_library/tmp.py", line 31, in <lambda> pfc.delivery.log_handler = lambda x, y, z: log_to_dataset(x, y, z, dataset=dataiku.Dataset("delivery_logs")) File "/home/emher/Projects/pipeline_test/PFC_python_library/pfc/delivery.py", line 402, in log_to_dataset writer.write_dataframe(df_log) File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dataset_write.py", line 395, in write_dataframe dku_pandas_csv.DKUCSVFormatter(df, self.remote_writer, File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dku_pandas_csv.py", line 197, in save self._save() File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dku_pandas_csv.py", line 296, in _save self._save_chunk(start_i, end_i) File "/home/emher/Projects/pipeline_test/PFC_python_library/venv/lib/python3.8/site-packages/dataiku/core/dku_pandas_csv.py", line 318, in _save_chunk for col_loc, col in zip(b.mgr_locs, d): TypeError: 'ObjectBlock' object is not iterable 0 rows successfully written (NBRAVMN4TD)Hence the error arises within dku_pandas_csv. Locally, my numpy and pandas versions are,numpy==1.20.0
pandas==1.2.1I am not sure what version dataiku uses internally. Do you know where i can see this? -
Hi,
you're indeed using a very recent Pandas, and python 3.7. DSS' python code actually doesn't handle it yet, so you should revert your pandas to pandas>=1.0,<1.1, and possibly use a python3.6 if that doesn't solve the error