Exporting preprocessing alongside model?

Hi, I am currently trying to export and deploy an ML model (LGBM) created and trained in DataIku to GitHub. It works just fine, but I'm not sure how to for the preprocessor of the model onto my validation dataset. I've tried searching online and have noticed there being some preprocessing functionality in the dataiku-scoring package used for the model, but can't manage to figure out how to preprocess and deploy. Every time I try preprocessing, when predicting on the transformed data an error is thrown mentioning that the model expects the same number of columns as the original dataset before any transformations have occurred. Does preprocessing happen within the .predict() function of the model, or am I missing something.
Here's an example of the Python code I am trying (and failing with):
X_numeric, X_nonumeric = lgbm_model.prepare_input.process(X_test) preprocessed = lgbm_model.preprocessings.process(X_numeric, X_nonumeric) lgbm_model.predict(preprocessed)
…
ValueError: Invalid input size, got n_columns=1031 instead of 55.
Any help would be greatly appreciated!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,337 Dataiker
Hi @nemo ,
Welcome to the Dataiku Community!
Indeed, the error is not expected for the method you shared; you can use predictor.preprocess to pre-process the data.
As shown here
https://knowledge.dataiku.com/latest/ml-analytics/model-results/tutorial-export-preprocessed-data.html#preprocess-the-input-dataframe
If this is not working for you, I would suggest you raise a support ticket with job diagnostics. For example, run your code in a test Python recipe, generate the job, and attach this to the support ticket.
One possibility is that you are using Prepare Script within the Visual ML that is doing additional pre-processing. This part is not included in the modesl exported to Python on which you can use Dataiku Scoring lib.
https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#limitations
https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-python