Selecting features in ML task
I'm training a set of models as given below. I want to include only one variable 'feature1' for training. but it appears that all the columns in the data are used for training. How do I include only this feature while training?
if trained_model_MAPE > ERROR_THRESHOLD: # Wait for the ML task to be ready mltask.wait_guess_complete() # Obtain settings, enable GBT, and save settings settings = mltask.get_settings() settings.set_algorithm_enabled("GBT_REGRESSION", True) settings.use_feature('feature1') settings.save() # Start training and wait for it to be complete mltask.start_train() mltask.wait_train_complete() # Get the identifiers of the trained models # There will be 3 of them because Logistic regression and Random forest were default enabled, plus GBT enabled above ids = mltask.get_trained_models_ids() mape_list = [] for id in ids: details = mltask.get_trained_model_details(id) algorithm = details.get_modeling_settings()["algorithm"] mape = details.get_performance_metrics()["mape"] print(f"Algorithm={algorithm} MAPE={mape}") mape_list.append(mape)
'
Operating system used: Windows
Best Answer
-
Like for algorithm, some other features have been enabled by default. You can use reject_feature to disable them.
For instance, using foreach_feature to iterate on all features:
features_to_use = ['feature1'] features_to_reject = [] def handle_feature(feature_name, feature_params): if feature_name not in features_to_use and feature_params["role"] == 'INPUT': features_to_reject.append(feature_name) return feature_params settings.foreach_feature(handle_feature) for feature_name in features_to_use: settings.use_feature(feature_name) for feature_name in features_to_reject: settings.reject_feature(feature_name)
Answers
-
@AdrienL
I'm facing the following error with the above solution
DataikuException: com.dataiku.dip.exceptions.DSSInternalErrorException: Internal error, caused by: NullPointerException: null
---------------------------------------------------------------------------HTTPError Traceback (most recent call last)/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body, headers) 1450 headers=headers)-> 1451 http_res.raise_for_status() 1452 return http_res
/opt/dataiku-dss-12.5.1/python39.packages/requests/models.py in raise_for_status(self) 1020 if http_error_msg:-> 1021 raise HTTPError(http_error_msg, response=self) 1022HTTPError: 500 Server Error: Server Error for url: http://127.0.0.1:10001/dip/publicapi/projects/PRICINGPOWERMODELS/models/lab/P6zS29Fv/PlwkqGVX/settings
During handling of the above exception, another exception occurred:
DataikuException Traceback (most recent call last)<ipython-input-367-3e542e8df9de> in <module> 17 settings.foreach_feature(handle_feature) 18---> 19 settings.save() 20 21 # Start training and wait for it to be complete/opt/dataiku-dss-12.5.1/python/dataikuapi/dss/ml.py in save(self) 600 """ 601--> 602 self.client._perform_empty( 603 "POST", "/projects/%s/models/lab/%s/%s/settings" % (self.project_key, self.analysis_id, self.mltask_id), 604 body = self.mltask_settings)/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_empty(self, method, path, params, body, files, raw_body) 1459 1460 def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):-> 1461 self._perform_http(method, path, params=params, body=body, files=files, stream=False, raw_body=raw_body) 1462 1463 def _perform_text(self, method, path, params=None, body=None,files=None, raw_body=None):/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body, headers) 1456 except ValueError: 1457 ex = {"message": http_res.text}-> 1458 raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("detailedMessage", ex.get("message", "No message")))) 1459 1460 def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):DataikuException: com.dataiku.dip.exceptions.DSSInternalErrorException: Internal error, caused by: NullPointerException: null -
Yeah I read the doc too fast, it states the handle_feature function is supposed to return the feature parameters. Also, one should only reject input features, otherwise we risk rejecting the target (not a good idea). I rewrote the code above and rearranged it for clarity.