Building many identical models for different columns
Hi everyone,
I have a dataset with products having a few features and a lot of binary classes in which they can be.
Every class is represented by a binary column. I would like to build a prediction model for every class which uses the features and the other classes as input to predict the label and find products which are sorted into the wrong class. The models all follow the same procedure, I take xgboost and take the same feature handling except for the one column I want to predict. In the picture you can see three models which are then evaluated and the wrong predictions stacked in the stack recipe.
Is it possible to use python code to automatically generate these models and create the evaluation recipes?
Best Answer
-
Yes, I think that is possible.
In the Dataiku API docs you can find information about how to do machine learning using the API. On this page https://doc.dataiku.com/dss/latest/python-api/ml.html#the-whole-cycle you can see an example of creating a new Machine Learning Task (like a model that you create in a Visual Analysis), manipulate its settings and deploy it to the flow. You can set up how to handle each feature and what models to use via the API.
Based on that example and the API documentation you will find on the same page you should be able to solve your problem.