API analogous for "Drop existing sets, recompute new ones" when retraining a model.

Solved!
gnaldi62
API analogous for "Drop existing sets, recompute new ones" when retraining a model.

Hi,

 is there a way to apply via API and code the same of the option "Drop existing sets, recompute new ones" as

  per below image ? Thanks Regards.

Giuseppe

Immagine 2021-02-23 164200.png

0 Kudos
1 Solution
arnaudde
Dataiker

Hello Giuseppe,
There is no supported way to do that. 
However with the following code snippet you can reach the same goal. Beware that this code is based on DSS internals and might stop working in the future.
I added a request to add the option in the python api in our Backlog.

p = client.get_project('MYPROJECT')
ml_task = p.get_ml_task("KGmcIliw", "PW0l9Nm8")
ml_task_settings = ml_task.get_settings()
ml_task_settings.get_raw()['splitParams']['instanceIdRefresher'] += 1
ml_task_settings.save()
ml_task.start_train()

 Best,
Arnaud

View solution in original post

15 Replies
arnaudde
Dataiker

Hello Giuseppe,
There is no supported way to do that. 
However with the following code snippet you can reach the same goal. Beware that this code is based on DSS internals and might stop working in the future.
I added a request to add the option in the python api in our Backlog.

p = client.get_project('MYPROJECT')
ml_task = p.get_ml_task("KGmcIliw", "PW0l9Nm8")
ml_task_settings = ml_task.get_settings()
ml_task_settings.get_raw()['splitParams']['instanceIdRefresher'] += 1
ml_task_settings.save()
ml_task.start_train()

 Best,
Arnaud

gnaldi62
Author

Hi, sorry to be back again. but this doesn't seem to work properly...  I print the value and I see it increasing by one, but the train fails. If I do the same from the UI ("drop....") the training works. But the strange is that sometimes after a few trials it start working. Should it have a specific value to work ?

Txs. Rgds.

Giuseppe

0 Kudos
gnaldi62
Author

def train_deploy_models(all_saved_models, models_to_deploy):
exit_status = 0
for smod in all_saved_models:
if smod['name'] in models_to_deploy:
print("Training model %s" % smod['name'])
algorithm_index = models_to_deploy.index(smod['name']) + 1
algorithm_to_deploy = models_to_deploy[algorithm_index]
current_model = smod['id']
current_saved_model = this_project.get_saved_model(current_model)
current_ml_task = current_saved_model.get_origin_ml_task()
ml_task_settings = current_ml_task.get_settings()
print(ml_task_settings.get_raw()['splitParams']['instanceIdRefresher'])
ml_task_settings.get_raw()['splitParams']['instanceIdRefresher'] += 1
print(ml_task_settings.get_raw()['splitParams']['instanceIdRefresher'])
ml_task_settings.save()
nr_attempts = 0
while nr_attempts < 2:
try:
list_trained = current_ml_task.train()
for jj in list_trained:
current_algorithm = current_ml_task.get_trained_model_details(jj).get_raw()["modeling"]["algorithm"]
if current_algorithm == algorithm_to_deploy:
current_ml_task.redeploy_to_flow(jj, saved_model_id = current_model)
break
except:
exit_status = 1
nr_attempts += 1
#raise Exception("MOD-01: Error with training the model %s " % smod['name'])
if exit_status > 0:
print("Error with model %s" % smod['name'])
return(exit_status)

0 Kudos
arnaudde
Dataiker

Hello,
Could you please share the error you get and attach the logs of the training ? 
Thanks

0 Kudos
gnaldi62
Author
Hi, here below one of the logs. We know that there is a null column, but if we retrain the
model by checking the "Drop existing sets, recompute new ones" checkbox, the train run successfully.
So we'd need to do is the same of the UI but programatically. Txs. Rgds. Giuseppe
...
Traceback (most recent call last): File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/server.py", line 46, in serve ret = api_command(arg) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/dkuapi.py", line 45, in aux return api(**kwargs) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/commands.py", line 271, in train_prediction_models_nosave train_df = df_from_split_desc(split_desc, "train", preprocessing_params['per_feature'], core_params["prediction_type"]) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/utils/split.py", line 59, in df_from_split_desc df = df_from_split_desc_no_normalization(split_desc, split, feature_params, prediction_type) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/utils/split.py", line 19, in df_from_split_desc_no_normalization return load_df_no_normalization(f, split_desc["schema"], feature_params, prediction_type) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/utils/split.py", line 28, in load_df_no_normalization prediction_type=prediction_type) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/utils/__init__.py", line 78, in ml_dtypes_from_dss_schema feature_params["role"], prediction_type) File "/mnt/disks/datadir/dataiku-dss-8.0.2/python/dataiku/doctor/utils/__init__.py", line 60, in ml_dtype_from_dss_column raise safe_exception(ValueError, u"Cannot treat column {} as numeric ({})".format(safe_unicode_str(schema_column["name"]), reason)) ValueError: Cannot treat column BDFL_SAL_D_ASP_MTG_New_LAG12_PROXY_CONV as numeric (its type is string) [2021/02/24-11:40:08.361] [KNL-python-single-command-kernel-monitor-582175] [INFO] [dku.kernels] - Process done with code 0 [2021/02/24-11:40:08.363] [KNL-python-single-command-kernel-monitor-582175] [INFO] [dip.tickets] - Destroying API ticket for analysis-ml-FR_SF_SCORE-9bdcDC3 on behalf of terrico [2021/02/24-11:40:08.363] [KNL-python-single-command-kernel-monitor-582175] [WARN] [dku.resource] - stat file for pid 31388 does not exist. Process died? [2021/02/24-11:40:08.364] [KNL-python-single-command-kernel-monitor-582175] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"ANALYSIS_ML_TRAIN","authIdentifier":"terrico","projectKey":"FR_SF_SCORE","analysisId":"xYiPqA2f","mlTaskId":"ZqCGSjjW","sessionId":"s21"},"type":"LOCAL_PROCESS","id":"0lKuN8MwsQuacNwm","startTime":1614166805429,"localProcess":{"pid":31388,"commandName":"/mnt/disks/datadir/dataiku_data/bin/python","cpuUserTimeMS":10,"cpuSystemTimeMS":0,"cpuChildrenUserTimeMS":0,"cpuChildrenSystemTimeMS":0,"cpuTotalMS":10,"cpuCurrent":0.0,"vmSizeMB":21,"vmRSSMB":4,"vmHWMMB":4,"vmRSSAnonMB":2,"vmDataMB":2,"vmSizePeakMB":21,"vmRSSPeakMB":4,"vmRSSTotalMBS":0,"majorFaults":0,"childrenMajorFaults":0}} [2021/02/24-11:40:08.364] [MRT-582169] [INFO] [dku.kernels] - Getting kernel tail [2021/02/24-11:40:08.368] [MRT-582169] [INFO] [dku.kernels] - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <type 'exceptions.ValueError'> : Cannot treat column BDFL_SAL_D_ASP_MTG_New_LAG12_PROXY_CONV as numeric (its type is string) from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@29e58718 process=null pid=?? retcode=0 [2021/02/24-11:40:08.368] [MRT-582169] [WARN] [dku.analysis.ml.python] - Training failed com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <type 'exceptions.ValueError'> : Cannot treat column BDFL_SAL_D_ASP_MTG_New_LAG12_PROXY_CONV as numeric (its type is string) at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190) at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:208) at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:74) at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:143) [2021/02/24-11:40:10.878] [FT-TrainWorkThread-7zCgQOd3-582158] [INFO] [dku.analysis.ml.python] T-ZqCGSjjW - Processing thread joined ... [2021/02/24-11:40:10.879] [FT-TrainWorkThread-7zCgQOd3-582158] [INFO] [dku.analysis.ml.python] T-ZqCGSjjW - Joining processing thread ... [2021/02/24-11:40:10.880] [FT-TrainWorkThread-7zCgQOd3-582158] [INFO] [dku.analysis.ml.python] T-ZqCGSjjW - Processing thread joined ... [2021/02/24-11:40:10.880] [FT-TrainWorkThread-7zCgQOd3-582158] [INFO] [dku.analysis.prediction] T-ZqCGSjjW - Train done [2021/02/24-11:40:10.881] [FT-TrainWorkThread-7zCgQOd3-582158] [INFO] [dku.analysis.prediction] T-ZqCGSjjW - Train done [2021/02/24-11:40:10.889] [FT-TrainWorkThread-7zCgQOd3-582158] [INFO] [dku.analysis.prediction] T-ZqCGSjjW - Publishing mltask-train-done reflected event 

 

0 Kudos
gnaldi62
Author

And here a snapshot from the analysis. The failed training session is the one run from the Python program (the function I've sent you earlier), while the last good one has run after from the UI we checked that checkbox. GN

AAAAAA.png

0 Kudos
arnaudde
Dataiker

Can you check and share what are the feature handling for the failing feature (ie BDFL_SAL_D_ASP_MTG_New_LAG12_PROXY_CONV) in the current_ml_task variable in your code and in the UI ? I suspect it has changed.

When using the UI to retrain you are using the latest settings in the design tab. Whereas in your api training your are taking the settings that were used when you train & deployed the original model. You should not expect the same behavior if some design settings has changed.

To make sure your splits where recomputed you can check which split was used by going to the model result page > Training information > Train & test sets > Generated.

0 Kudos
gnaldi62
Author

Hi, here the configuration from the API (it's for another variable, but the problem is exactly the same). This is the configuration BEFORE running the snippet you originally sent :

'BDFL_ACT_D_RR_ASP_3M_PROXY_CONV': {
'generate_derivative': False,
'numerical_handling': 'REGULAR',
'missing_handling': 'IMPUTE',
'missing_impute_with': 'MEAN',
'impute_constant_value': 0.0,
'rescaling': 'AVGSTD',
'quantile_bin_nb_bins': 4,
'binarize_threshold_mode': 'MEDIAN',
'binarize_constant_threshold': 0.0,
'role': 'INPUT',
'type': 'NUMERIC',
'customHandlingCode': '',
'customProcessorWantsMatrix': False,
'sendToInput': 'main'}

and this is AFTER:

'BDFL_ACT_D_RR_ASP_3M_PROXY_CONV': {
'category_handling': 'DUMMIFY',
'missing_handling': 'NONE',
'missing_impute_with': 'MODE',
'dummy_clip': 'MAX_NB_CATEGORIES',
'cumulative_proportion': 0.95,
'min_samples': 10,
'max_nb_categories': 100,
'max_cat_safety': 200,
'nb_bins_hashing': 1048576,
'dummy_drop': 'NONE',
'role': 'REJECT',
'type': 'CATEGORY',
'customHandlingCode': '',
'customProcessorWantsMatrix': False,
'sendToInput': 'main'}

And here what in the UI for the same model:

FEAT_UI.png

โ€ƒWhat is not clear to me how to intercept the new UI settings from the API (we have thousands of such features). Rgds. Giuseppe

0 Kudos
gnaldi62
Author

Hi, it seems working now. I've removed failed sessions, added a sleep after the save of the settings and retried to

train the model. Don't know which one of these have fixed the issue, but now all the models can be retrained and

redeployed.

Txs. Rgds

Giuseppe

0 Kudos
gnaldi62
Author

Just to let you know....we managed to make the code working, but a minimal manual intervention was needed. The steps we follow are:

1) duplicate the project;

2) from the code populate the train and test datasets and retrain (with the snippet you suggested) the saved models a first time;

3) if some training fail, we go into the analysis and from the GUI we remove the failed sessions;

4) we then go back to the code and rerun the piece of code which retrain the saved models.

This seems to work almost always (i.e. unless the model is really bad).

Rgds. Giuseppe

0 Kudos
arnaudde
Dataiker

My guess is that your input dataset has changed and that the feature handling needed to be updated. The start_train method of the api will not do it automatically whereas opening the ui and launching the training will. 

When creating an ML task on a dataset with an empty column the column will automatically be rejected and the role will be 'REJECT'. In the origin ml task settings for 'BDFL_ACT_D_RR_ASP_3M_PROXY_CONV' you shared the feature is not rejected. So it probably means that when you first trained the column it was not mostly empty. Now your new settings show that it is now rejected which means that your column is probably empty. So I think your column data has significantly changed.

When you use the UI the guessing system will automatically be called when opening the analysis. This does not happen when calling ml_task.get_settings(). Therefore if your dataset changed and you train with the UI you don't have any problem but when you train with the api without updating the feature handling it will fail.

You can apply the guessing system with  the api guess method.

In the steps you just mentioned I therefore think that the step that make it works is opening the analysis that failed (ie. 3))

Best,
Arnaud

0 Kudos
gnaldi62
Author

OK. Is there a guess option which does not change the already chosen algorithms ? The doc speaks about different levels but it is not clear what each one is doing (when I applied it, it changed the selection of the algorithms). Thanks. Rgds, Giuseppe

0 Kudos
arnaudde
Dataiker

There is no option that will keep your algorithm settings. But you can save the algorithm settings (and any settings that should not change) and override the returned ml task with those. Here is a code sample

ml_task = p.get_ml_task("2oftsf46", "ITJzkpmW")
algorithm_settings = ml_task_settings.get_raw()["modeling"]
ml_task.guess()
ml_task_settings = ml_task.get_settings()
ml_task_settings.get_raw()["modeling"] = algorithm_settings
ml_task_settings.save()
0 Kudos
gnaldi62
Author

Great: it worked! (just added a line to your code before assignment to algorithms_settings)

Many thanks. Regards,

Giuseppe

0 Kudos
gnaldi62
Author

Many thanks. Regards.

Giuseppe

0 Kudos