Memory Error during Score recipe

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
I have a dataset (250k rows) that I need to predict labels for. However, whenever I run the Score recipe I run into a Memory Error. Is there a way to batch score datasets?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    Scoring is already done in small batches. What amount of memory do you have on your machine ? How much free memory before running the recipe ? How many columns in the dataset ? What kind of processing (ie, are you using hashing, count vectorization or tfidf for example ?)
  • Wuser92
    Wuser92 Registered Posts: 20 ✭✭✭✭
    I have 8 GBs of memory available. 1GB is still free on the machine, pretty much all of the other 7GB are used by DSS. The input dataset has 8 columns, but I apply a TF-IDF vectorization on a column containing lots of tags. The "algorithm" tab in the model view says after pre-processing there are 1016 columns. Estimated memory usage is 94MB (for training only, I guess).
    The training works perfectly with 25k rows, but the scoring on the 250k rows fails.

    Here's the traceback also:
    [11:57:18] [INFO] [dku.utils] - Traceback (most recent call last):
    [11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
    [11:57:18] [INFO] [dku.utils] - "__main__", fname, loader, pkg_name)
    [11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    [11:57:18] [INFO] [dku.utils] - exec code in run_globals
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 146, in
    [11:57:18] [INFO] [dku.utils] - json.load_from_filepath(sys.argv[7]))
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 133, in main
    [11:57:18] [INFO] [dku.utils] - for output_df in output_generator():
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 78, in output_generator
    [11:57:18] [INFO] [dku.utils] - output_probas=recipe_desc["outputProbabilities"])
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/classification_scoring.py", line 197, in binary_classification_predict
    [11:57:18] [INFO] [dku.utils] - (pred_df, proba_df) = binary_classification_predict_ex(clf, modeling_params, target_map, threshold, transformed, output_probas)
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/classification_scoring.py", line 148, in binary_classification_predict_ex
    [11:57:18] [INFO] [dku.utils] - features_X_df = features_X.as_dataframe()
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/multiframe.py", line 253, in as_dataframe
    [11:57:18] [INFO] [dku.utils] - return pd.concat(blockvals, axis=1)
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 846, in concat
    [11:57:18] [INFO] [dku.utils] - return op.get_result()
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 1038, in get_result
    [11:57:18] [INFO] [dku.utils] - copy=self.copy)
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4545, in concatenate_block_managers
    [11:57:18] [INFO] [dku.utils] - for placement, join_units in concat_plan]
    [11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4648, in concatenate_join_units
    [11:57:18] [INFO] [dku.utils] - concat_values = concat_values.copy()
    [11:57:18] [INFO] [dku.utils] - MemoryError
    [11:57:18] [INFO] [dku.flow.activity] - Run thread failed for activity score_Companies_unlabelled_AI_prepared_NP
Setup Info
    Tags
      Help me…