Manipulation and documentation of custom model
Hi there,
I made a model using VisualML and utilizing a self-developed custom model plugin (CatBoost).
I also deployed the resulting model to the flow successfully.
I would love to be able to do 2 things with this model:
1. Export model documentation from the VisualML model summary "Export model summary". This throws errors with the custom plugin. However, all the information that would end up in the document seems to me to be available in the model summary page. Is there any way to get this to work?
2. Pick up the model object in a python notebook using the python API. That way I can use the predictor for other tasks (for example calculate permutation importance).
Normally I'd do it like this
# Retrieve trained model object model = dataiku.Model("qC0ANLpX") predictor = model.get_predictor() clf = predictor._clf
But that throws an error, because the model is not a known model type in dataiku's inner workings, I guess. Is there another way to get the predictor/sklearn model object from the saved flow model?
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-3-c3a6a531446f> in <module> 1 # Retrieve trained model object 2 model = dataiku.Model("qC0ANLpX") ----> 3 predictor = model.get_predictor() 4 clf = predictor._clf 5 /opt/dss/dataiku-dss-10.0.4/python/dataiku/core/saved_model.py in get_predictor(self, version_id) 206 model_folder = target_model_folder 207 --> 208 self._predictors[version_id] = build_predictor_for_saved_model(model_folder, self.get_type(), sm.get("conditionalOutputs", [])) 209 return self._predictors[version_id] 210 /opt/dss/dataiku-dss-10.0.4/python/dataiku/core/saved_model.py in build_predictor_for_saved_model(model_folder, model_type, conditional_outputs) 331 from dataiku.doctor.utils.split import get_saved_model_resolved_split_desc 332 split_desc = get_saved_model_resolved_split_desc(model_folder) --> 333 return build_predictor(model_type, model_folder, model_folder, conditional_outputs, core_params, split_desc) 334 335 /opt/dss/dataiku-dss-10.0.4/python/dataiku/core/saved_model.py in build_predictor(model_type, model_folder, preprocessing_folder, conditional_outputs, core_params, split_desc, train_split_desc) 396 pkl_path = osp.join(model_folder, "clf.pkl" if is_prediction else "clusterer.pkl") 397 with open(pkl_path, "rb") as f: --> 398 clf = pickle.load(f) 399 try: 400 logger.info("Post-processing model") ModuleNotFoundError: No module named 'modelcatboost'
Best Answer
-
Antal Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 91 Neuron
Thanks for the tip. That wasn't the issue...
I've managed to figure out a workaround, actually.
Both functionalities weren't able to find the custom model code, because they're not executed from within the plugin's environment, but from the dataiku global environment.
I've added the custom catboost code to the Global code library. That way the custom model code is available globally and the functionalities are able to find them. This way, I was able to make model object manipulation, model documentation generation and model views (error analysis, fairness report) work with models using the custom model plugin.
It's not exactly pretty, but it does work!
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,112 Neuron
For the second issue make sure the catboost package is installed on the Python code environment that you are using in your Jupyter Notebook.
https://doc.dataiku.com/dss/latest/python/packages.html