New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Memory Error during Score recipe

UserBird
Dataiker
Dataiker
Memory Error during Score recipe
I have a dataset (250k rows) that I need to predict labels for. However, whenever I run the Score recipe I run into a Memory Error. Is there a way to batch score datasets?
0 Kudos
2 Replies
Clément_Stenac
Dataiker
Dataiker
Hi,

Scoring is already done in small batches. What amount of memory do you have on your machine ? How much free memory before running the recipe ? How many columns in the dataset ? What kind of processing (ie, are you using hashing, count vectorization or tfidf for example ?)
0 Kudos
Wuser92
Level 2
I have 8 GBs of memory available. 1GB is still free on the machine, pretty much all of the other 7GB are used by DSS. The input dataset has 8 columns, but I apply a TF-IDF vectorization on a column containing lots of tags. The "algorithm" tab in the model view says after pre-processing there are 1016 columns. Estimated memory usage is 94MB (for training only, I guess).
The training works perfectly with 25k rows, but the scoring on the 250k rows fails.

Here's the traceback also:
[11:57:18] [INFO] [dku.utils] - Traceback (most recent call last):
[11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
[11:57:18] [INFO] [dku.utils] - "__main__", fname, loader, pkg_name)
[11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
[11:57:18] [INFO] [dku.utils] - exec code in run_globals
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 146, in
[11:57:18] [INFO] [dku.utils] - json.load_from_filepath(sys.argv[7]))
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 133, in main
[11:57:18] [INFO] [dku.utils] - for output_df in output_generator():
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 78, in output_generator
[11:57:18] [INFO] [dku.utils] - output_probas=recipe_desc["outputProbabilities"])
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/classification_scoring.py", line 197, in binary_classification_predict
[11:57:18] [INFO] [dku.utils] - (pred_df, proba_df) = binary_classification_predict_ex(clf, modeling_params, target_map, threshold, transformed, output_probas)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/classification_scoring.py", line 148, in binary_classification_predict_ex
[11:57:18] [INFO] [dku.utils] - features_X_df = features_X.as_dataframe()
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/multiframe.py", line 253, in as_dataframe
[11:57:18] [INFO] [dku.utils] - return pd.concat(blockvals, axis=1)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 846, in concat
[11:57:18] [INFO] [dku.utils] - return op.get_result()
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 1038, in get_result
[11:57:18] [INFO] [dku.utils] - copy=self.copy)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4545, in concatenate_block_managers
[11:57:18] [INFO] [dku.utils] - for placement, join_units in concat_plan]
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4648, in concatenate_join_units
[11:57:18] [INFO] [dku.utils] - concat_values = concat_values.copy()
[11:57:18] [INFO] [dku.utils] - MemoryError
[11:57:18] [INFO] [dku.flow.activity] - Run thread failed for activity score_Companies_unlabelled_AI_prepared_NP
0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS