BUG: Error introduced in dataiku.core.saved_model

UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

Looks like an error was introduced in 5.0.1 (it worked in 5.0.0) that prevents dataiku.core.saved_model.Predictor.predict from working properly because it raises an error with Pandas

ValueError: If using all scalar values, you must pass an index

Full Error message

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in predict(self, df, with_input_cols, with_prediction, with_probas, with_conditional_outputs, with_proba_percentile)
591 column_types[k] = np.object
592 pred_df = self._get_prediction_dataframe(dates_handled.astype(column_types), with_prediction, with_probas, with_conditional_outputs,
--> 593 with_proba_percentile)
594 if with_input_cols:
595 return pd.concat([df, pred_df], axis=1)

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in _get_prediction_dataframe(self, input_df, with_prediction, with_probas, with_conditional_outputs, with_proba_percentile)
456 with_conditional_outputs, with_proba_percentile):
457 if self.params.model_type == "PREDICTION":
--> 458 pred_df = self._prediction_type_dataframe(input_df, with_prediction, with_probas)
459 self._add_percentiles_and_condoutputs(pred_df, with_proba_percentile, with_conditional_outputs)
460 return pred_df

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in _prediction_type_dataframe(self, input_df, with_prediction, with_probas)
488 if prediction_type == "REGRESSION":
489 if with_prediction:
--> 490 pred_df = pd.DataFrame({"prediction": self._clf.predict(X)[0]})
491 else:
492 raise ValueError("Predicting a regression model with with_prediction=False. Oops.")

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
273 dtype=dtype, copy=copy)
274 elif isinstance(data, dict):
--> 275 mgr = self._init_dict(data, index, columns, dtype=dtype)
276 elif isinstance(data, ma.MaskedArray):
277 import numpy.ma.mrecords as mrecords

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
409 arrays = [data[k] for k in keys]
--> 411 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
413 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5494 # figure out the index, if necessary
5495 if index is None:
-> 5496 index = extract_index(arrays)
5497 else:
5498 index = _ensure_index(index)

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in extract_index(data)
5534 if not indexes and not raw_lengths:
-> 5535 raise ValueError('If using all scalar values, you must pass'
5536 ' an index')

ValueError: If using all scalar values, you must pass an index


  • Samuel_R_
    Samuel_R_ Dataiker Posts: 8 Dataiker

    Thank you for your feedback !

    We could reproduce the issue. It will be fixed it in the forthcoming 5.0.2 release.

    Best regards
  • ricslator
    ricslator Registered Posts: 1 ✭✭✭

    The error message says that if you're passing scalar values, you have to pass an index. Pandas unfortunately always needs an index when created a DatFrame from Dictionary. What this is essentially asking for is a column number for each dictionary to correspond to each dictionary. You can either set it yourself, or use an object with the following structure so pandas can determine the index itself:

    df = pd.DataFrame({'A': [a], 'B': [b]})

    or use scalar values and pass an index:

    pd.DataFrame({'A': a, 'B': b}, index=[0])

Setup Info
      Help me…