How to add feature selection into custom prediction algorithm plugin
Hi,
I have built a simple XGBoost classifier algorithm by following closely: https://doc.dataiku.com/dss/7.0/plugins/reference/prediction-algorithms.html
However, I would like to add UI feature that allows to choose which columns to be included as Xs (features) when applying algorithm. I have a feeling that following part should include another param from json that allows to select columns (probably in fit()), but I can't find any examples in docs and github on how to implement it since I consider myself beginner in python.
def get_clf(self): """ This method must return a scikit-learn compatible model, ie: - have a fit(X,y) and predict(X) methods. If sample weights are enabled for this algorithm (in algo.json), the fit method must have instead the signature fit(X, y, sample_weight=None) - have a get_params() and set_params(**params) methods """
Answers
-
Hi,
Even with custom models, once the plugin component is added, it will be available in the visual ML Lab, as any other algorithm. Hence, DSS handles the features automatically for you. You can select features through the built-in DSS feature handling in the design section which eliminates the need to mention the features/columns explicitly in the model itself or modifying UI to select columns.
Best,
Kasim