Exporting preprocessing alongside model?

nemo
nemo Registered Posts: 1
edited September 18 in Using Dataiku

Hi, I am currently trying to export and deploy an ML model (LGBM) created and trained in DataIku to GitHub. It works just fine, but I'm not sure how to for the preprocessor of the model onto my validation dataset. I've tried searching online and have noticed there being some preprocessing functionality in the dataiku-scoring package used for the model, but can't manage to figure out how to preprocess and deploy. Every time I try preprocessing, when predicting on the transformed data an error is thrown mentioning that the model expects the same number of columns as the original dataset before any transformations have occurred. Does preprocessing happen within the .predict() function of the model, or am I missing something.

Here's an example of the Python code I am trying (and failing with):

X_numeric, X_nonumeric = lgbm_model.prepare_input.process(X_test)
preprocessed = lgbm_model.preprocessings.process(X_numeric, X_nonumeric)
lgbm_model.predict(preprocessed)

ValueError: Invalid input size, got n_columns=1031 instead of 55.

Any help would be greatly appreciated!

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,337 Dataiker

    Hi @nemo ,

    Welcome to the Dataiku Community!

    Indeed, the error is not expected for the method you shared; you can use predictor.preprocess to pre-process the data.
    As shown here
    https://knowledge.dataiku.com/latest/ml-analytics/model-results/tutorial-export-preprocessed-data.html#preprocess-the-input-dataframe

    If this is not working for you, I would suggest you raise a support ticket with job diagnostics. For example, run your code in a test Python recipe, generate the job, and attach this to the support ticket.

    One possibility is that you are using Prepare Script within the Visual ML that is doing additional pre-processing. This part is not included in the modesl exported to Python on which you can use Dataiku Scoring lib.

    https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#limitations
    https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-python


Setup Info
    Tags
      Help me…