Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Memory Error during Score recipe

Memory Error during Score recipe
I have a dataset (250k rows) that I need to predict labels for. However, whenever I run the Score recipe I run into a Memory Error. Is there a way to batch score datasets?
0 Kudos
2 Replies

Scoring is already done in small batches. What amount of memory do you have on your machine ? How much free memory before running the recipe ? How many columns in the dataset ? What kind of processing (ie, are you using hashing, count vectorization or tfidf for example ?)
0 Kudos
Level 3
I have 8 GBs of memory available. 1GB is still free on the machine, pretty much all of the other 7GB are used by DSS. The input dataset has 8 columns, but I apply a TF-IDF vectorization on a column containing lots of tags. The "algorithm" tab in the model view says after pre-processing there are 1016 columns. Estimated memory usage is 94MB (for training only, I guess).
The training works perfectly with 25k rows, but the scoring on the 250k rows fails.

Here's the traceback also:
[11:57:18] [INFO] [dku.utils] - Traceback (most recent call last):
[11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/", line 174, in _run_module_as_main
[11:57:18] [INFO] [dku.utils] - "__main__", fname, loader, pkg_name)
[11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/", line 72, in _run_code
[11:57:18] [INFO] [dku.utils] - exec code in run_globals
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/", line 146, in
[11:57:18] [INFO] [dku.utils] - json.load_from_filepath(sys.argv[7]))
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/", line 133, in main
[11:57:18] [INFO] [dku.utils] - for output_df in output_generator():
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/", line 78, in output_generator
[11:57:18] [INFO] [dku.utils] - output_probas=recipe_desc["outputProbabilities"])
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/", line 197, in binary_classification_predict
[11:57:18] [INFO] [dku.utils] - (pred_df, proba_df) = binary_classification_predict_ex(clf, modeling_params, target_map, threshold, transformed, output_probas)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/", line 148, in binary_classification_predict_ex
[11:57:18] [INFO] [dku.utils] - features_X_df = features_X.as_dataframe()
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/", line 253, in as_dataframe
[11:57:18] [INFO] [dku.utils] - return pd.concat(blockvals, axis=1)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/", line 846, in concat
[11:57:18] [INFO] [dku.utils] - return op.get_result()
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/", line 1038, in get_result
[11:57:18] [INFO] [dku.utils] - copy=self.copy)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/", line 4545, in concatenate_block_managers
[11:57:18] [INFO] [dku.utils] - for placement, join_units in concat_plan]
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/", line 4648, in concatenate_join_units
[11:57:18] [INFO] [dku.utils] - concat_values = concat_values.copy()
[11:57:18] [INFO] [dku.utils] - MemoryError
[11:57:18] [INFO] [dku.flow.activity] - Run thread failed for activity score_Companies_unlabelled_AI_prepared_NP
0 Kudos


Labels (2)
A banner prompting to get Dataiku