Selecting features in ML task

MNOP
Level 3
Selecting features in ML task

I'm training a set of models as given below. I want to include only one variable 'feature1' for training. but it appears that all the columns in the data are used for training. How do I include only this feature while training?

 

if trained_model_MAPE > ERROR_THRESHOLD:

    # Wait for the ML task to be ready

    mltask.wait_guess_complete()

    # Obtain settings, enable GBT, and save settings

    settings = mltask.get_settings()

    settings.set_algorithm_enabled("GBT_REGRESSION", True)

    settings.use_feature('feature1')

    settings.save()

    # Start training and wait for it to be complete

    mltask.start_train()

    mltask.wait_train_complete()

    # Get the identifiers of the trained models

    # There will be 3 of them because Logistic regression and Random forest were default enabled, plus GBT enabled above

    ids = mltask.get_trained_models_ids()

    mape_list = []

    for id in ids:

        details = mltask.get_trained_model_details(id)

        algorithm = details.get_modeling_settings()["algorithm"]

        mape = details.get_performance_metrics()["mape"]

        print(f"Algorithm={algorithm} MAPE={mape}")

        mape_list.append(mape)

 


Operating system used: Windows

0 Kudos
3 Replies
AdrienL
Dataiker

Like for algorithm, some other features have been enabled by default. You can use reject_feature to disable them.

For instance, using foreach_feature to iterate on all features:

 

 

features_to_use = ['feature1']
features_to_reject = []
def handle_feature(feature_name, feature_params):
    if feature_name not in features_to_use and feature_params["role"] == 'INPUT':
        features_to_reject.append(feature_name)
    return feature_params

settings.foreach_feature(handle_feature)
for feature_name in features_to_use:
    settings.use_feature(feature_name)
for feature_name in features_to_reject:
    settings.reject_feature(feature_name)

 

 

 

0 Kudos
MNOP
Level 3
Author

@AdrienL I'm facing the following error with the above solution 

DataikuException: com.dataiku.dip.exceptions.DSSInternalErrorException: Internal error, caused by: NullPointerException: null

---------------------------------------------------------------------------HTTPError                                 Traceback (most recent call last)/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body, headers)   1450                     headers=headers)-> 1451             http_res.raise_for_status()   1452             return http_res
/opt/dataiku-dss-12.5.1/python39.packages/requests/models.py in raise_for_status(self)   1020         if http_error_msg:-> 1021             raise HTTPError(http_error_msg, response=self)   1022HTTPError: 500 Server Error: Server Error for url: http://127.0.0.1:10001/dip/publicapi/projects/PRICINGPOWERMODELS/models/lab/P6zS29Fv/PlwkqGVX/settin...
During handling of the above exception, another exception occurred:
DataikuException                          Traceback (most recent call last)<ipython-input-367-3e542e8df9de> in <module>     17     settings.foreach_feature(handle_feature)     18---> 19     settings.save()     20     21     # Start training and wait for it to be complete/opt/dataiku-dss-12.5.1/python/dataikuapi/dss/ml.py in save(self)    600         """    601--> 602         self.client._perform_empty(    603                 "POST", "/projects/%s/models/lab/%s/%s/settings" % (self.project_key, self.analysis_id, self.mltask_id),    604                 body = self.mltask_settings)/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_empty(self, method, path, params, body, files, raw_body)   1459   1460     def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):-> 1461         self._perform_http(method, path, params=params, body=body, files=files, stream=False, raw_body=raw_body)   1462   1463     def _perform_text(self, method, path, params=None, body=None,files=None, raw_body=None):/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body, headers)   1456             except ValueError:   1457                 ex = {"message": http_res.text}-> 1458             raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("detailedMessage", ex.get("message", "No message"))))   1459   1460     def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):DataikuException: com.dataiku.dip.exceptions.DSSInternalErrorException: Internal error, caused by: NullPointerException: null

0 Kudos
AdrienL
Dataiker

Yeah I read the doc too fast, it states the handle_feature function is supposed to return the feature parameters. Also, one should only reject input features, otherwise we risk rejecting the target (not a good idea). I rewrote the code above and rearranged it for clarity.

0 Kudos