Selecting features in ML task

Options
Mohammed
Mohammed Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 40 ✭✭✭
edited July 16 in Using Dataiku

I'm training a set of models as given below. I want to include only one variable 'feature1' for training. but it appears that all the columns in the data are used for training. How do I include only this feature while training?

if trained_model_MAPE > ERROR_THRESHOLD:

    # Wait for the ML task to be ready

    mltask.wait_guess_complete()

    # Obtain settings, enable GBT, and save settings

    settings = mltask.get_settings()

    settings.set_algorithm_enabled("GBT_REGRESSION", True)

    settings.use_feature('feature1')

    settings.save()

    # Start training and wait for it to be complete

    mltask.start_train()

    mltask.wait_train_complete()

    # Get the identifiers of the trained models

    # There will be 3 of them because Logistic regression and Random forest were default enabled, plus GBT enabled above

    ids = mltask.get_trained_models_ids()

    mape_list = []

    for id in ids:

        details = mltask.get_trained_model_details(id)

        algorithm = details.get_modeling_settings()["algorithm"]

        mape = details.get_performance_metrics()["mape"]

        print(f"Algorithm={algorithm} MAPE={mape}")

        mape_list.append(mape)

'


Operating system used: Windows

Best Answer

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    edited July 17 Answer ✓
    Options

    Like for algorithm, some other features have been enabled by default. You can use reject_feature to disable them.

    For instance, using foreach_feature to iterate on all features:

    features_to_use = ['feature1']
    features_to_reject = []
    def handle_feature(feature_name, feature_params):
        if feature_name not in features_to_use and feature_params["role"] == 'INPUT':
            features_to_reject.append(feature_name)
        return feature_params
    
    settings.foreach_feature(handle_feature)
    for feature_name in features_to_use:
        settings.use_feature(feature_name)
    for feature_name in features_to_reject:
        settings.reject_feature(feature_name)
    

Answers

  • Mohammed
    Mohammed Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 40 ✭✭✭
    Options

    @AdrienL
    I'm facing the following error with the above solution

    DataikuException: com.dataiku.dip.exceptions.DSSInternalErrorException: Internal error, caused by: NullPointerException: null

    ---------------------------------------------------------------------------HTTPError Traceback (most recent call last)/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body, headers) 1450 headers=headers)-> 1451 http_res.raise_for_status() 1452 return http_res
    /opt/dataiku-dss-12.5.1/python39.packages/requests/models.py in raise_for_status(self) 1020 if http_error_msg:-> 1021 raise HTTPError(http_error_msg, response=self) 1022HTTPError: 500 Server Error: Server Error for url: http://127.0.0.1:10001/dip/publicapi/projects/PRICINGPOWERMODELS/models/lab/P6zS29Fv/PlwkqGVX/settings
    During handling of the above exception, another exception occurred:
    DataikuException Traceback (most recent call last)<ipython-input-367-3e542e8df9de> in <module> 17 settings.foreach_feature(handle_feature) 18---> 19 settings.save() 20 21 # Start training and wait for it to be complete/opt/dataiku-dss-12.5.1/python/dataikuapi/dss/ml.py in save(self) 600 """ 601--> 602 self.client._perform_empty( 603 "POST", "/projects/%s/models/lab/%s/%s/settings" % (self.project_key, self.analysis_id, self.mltask_id), 604 body = self.mltask_settings)/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_empty(self, method, path, params, body, files, raw_body) 1459 1460 def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):-> 1461 self._perform_http(method, path, params=params, body=body, files=files, stream=False, raw_body=raw_body) 1462 1463 def _perform_text(self, method, path, params=None, body=None,files=None, raw_body=None):/opt/dataiku-dss-12.5.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body, headers) 1456 except ValueError: 1457 ex = {"message": http_res.text}-> 1458 raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("detailedMessage", ex.get("message", "No message")))) 1459 1460 def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):DataikuException: com.dataiku.dip.exceptions.DSSInternalErrorException: Internal error, caused by: NullPointerException: null

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    Options

    Yeah I read the doc too fast, it states the handle_feature function is supposed to return the feature parameters. Also, one should only reject input features, otherwise we risk rejecting the target (not a good idea). I rewrote the code above and rearranged it for clarity.

Setup Info
    Tags
      Help me…