Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

Manipulation and documentation of custom model

Solved!
Antal
Level 3
Manipulation and documentation of custom model

Hi there,

 

I made a model using VisualML and utilizing a self-developed custom model plugin (CatBoost).

I also deployed the resulting model to the flow successfully.

 

I would love to be able to do 2 things with this model:

1. Export model documentation from the VisualML model summary "Export model summary". This throws errors with the custom plugin. However, all the information that would end up in the document seems to me to be available in the model summary page. Is there any way to get this to work?

2. Pick up the model object in a python notebook using the python API. That way I can use the predictor for other tasks (for example calculate permutation importance).

Normally I'd do it like this

 

# Retrieve trained model object
model = dataiku.Model("qC0ANLpX")
predictor = model.get_predictor()
clf = predictor._clf

 

But that throws an error, because the model is not a known model type in dataiku's inner workings, I guess. Is there another way to get the predictor/sklearn model object from the saved flow model?

 

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-c3a6a531446f> in <module>
      1 # Retrieve trained model object
      2 model = dataiku.Model("qC0ANLpX")
----> 3 predictor = model.get_predictor()
      4 clf = predictor._clf
      5 

/opt/dss/dataiku-dss-10.0.4/python/dataiku/core/saved_model.py in get_predictor(self, version_id)
    206                 model_folder = target_model_folder
    207 
--> 208             self._predictors[version_id] = build_predictor_for_saved_model(model_folder, self.get_type(), sm.get("conditionalOutputs", []))
    209         return self._predictors[version_id]
    210 

/opt/dss/dataiku-dss-10.0.4/python/dataiku/core/saved_model.py in build_predictor_for_saved_model(model_folder, model_type, conditional_outputs)
    331     from dataiku.doctor.utils.split import get_saved_model_resolved_split_desc
    332     split_desc = get_saved_model_resolved_split_desc(model_folder)
--> 333     return build_predictor(model_type, model_folder, model_folder, conditional_outputs, core_params, split_desc)
    334 
    335 

/opt/dss/dataiku-dss-10.0.4/python/dataiku/core/saved_model.py in build_predictor(model_type, model_folder, preprocessing_folder, conditional_outputs, core_params, split_desc, train_split_desc)
    396             pkl_path = osp.join(model_folder, "clf.pkl" if is_prediction else "clusterer.pkl")
    397             with open(pkl_path, "rb") as f:
--> 398                 clf = pickle.load(f)
    399                 try:
    400                     logger.info("Post-processing model")

ModuleNotFoundError: No module named 'modelcatboost'

 

 

0 Kudos
1 Solution
Antal
Level 3
Author

Thanks for the tip. That wasn't the issue...

 

I've managed to figure out a workaround, actually.

Both functionalities weren't able to find the custom model code, because they're not executed from within the plugin's environment, but from the dataiku global environment.

I've added the custom catboost code to the Global code library. That way the custom model code is available globally and the functionalities are able to find them. This way, I was able to make model object manipulation, model documentation generation and model views (error analysis, fairness report) work with models using the custom model plugin.

It's not exactly pretty, but it does work!

View solution in original post

2 Replies
Turribeach
Level 6

For the second issue make sure the catboost package is installed on the Python code environment that you are using in your Jupyter Notebook. 

 

https://doc.dataiku.com/dss/latest/python/packages.html

 

0 Kudos
Antal
Level 3
Author

Thanks for the tip. That wasn't the issue...

 

I've managed to figure out a workaround, actually.

Both functionalities weren't able to find the custom model code, because they're not executed from within the plugin's environment, but from the dataiku global environment.

I've added the custom catboost code to the Global code library. That way the custom model code is available globally and the functionalities are able to find them. This way, I was able to make model object manipulation, model documentation generation and model views (error analysis, fairness report) work with models using the custom model plugin.

It's not exactly pretty, but it does work!