Using custom python model for Clustering (Agglomerative Clustering)
Hi all,
I have a question regarding custom python models for a clustering modelling task.
I am trying to do something really basic, like running Agglomerative Clustering using a different metric and linkage methods (included in sklearn natively). at the moment, I seem to be unable to do so default model in dataiku, so I tried to build my own custom estimator. I would like to also pass the number of clusters as a parameter of the GUI like other models. Optionally, I would like to pass also the metric and linkage as a parameter of the GUI for testing different models.
I tried to follow this link but I am not able to make it work:
from sklearn.cluster import AgglomerativeClustering clf = AgglomerativeClustering(n_clusters = n_clusters , metric='cosine', linkage='complete')
I don't understand why, but this code doesn't work:
Failed to train : <class 'TypeError'> : __init__() got an unexpected keyword argument 'metric'
[2024/05/28-09:57:37.906] [MRT-1234331] [INFO] [dku.block.link.interaction] - Check result for nullity exceptionIfNull=true result=null Traceback (most recent call last): File "/opt/dataiku/python/dataiku/doctor/server.py", line 45, in serve ret = api_command(arg) File "/opt/dataiku/python/dataiku/doctor/dkuapi.py", line 46, in aux return api(**kwargs) File "/opt/dataiku/python/dataiku/doctor/commands.py", line 673, in train_clustering_models_nosave pipeline) File "/opt/dataiku/python/dataiku/doctor/clustering_entrypoints.py", line 16, in clustering_train_score_save (clf, actual_params, cluster_labels, additional_columns) = clustering_fit(modeling_params, transformed_src) File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 108, in clustering_fit clf = clustering_model_from_params(modeling_params, len(train.index)) File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 39, in clustering_model_from_params return scikit_model(modeling_params) File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 22, in scikit_model exec(code, ctx) File "<string>", line 3, in <module> TypeError: __init__() got an unexpected keyword argument 'metric' [2024-05-28 07:57:37,907] [11/MainThread] [INFO] [dataiku.base.socket_block_link] Client closed
but this works in a Juputer notebook (see attachment)
Could anyone support me please?
Thanks!
Best Answer
-
It seems the
metric
parameter was added in scikit-learn 1.2. So my guess is that the code environment you are using for the notebook has sklearn >= 1.2 but the one you are using on the Lab ML Task has an earlier version of sklearn.You must use a code environment that both has the recommended packages for Visual ML and for which the version of sklearn is at least 1.2. Note that support for sklearn 1.2 in Visual ML was introduced in Dataiku 12.5.0, so you'll need to be at least on this version.
Answers
-
Thanks a lot for the timely response!
Indeed my issue was exactly that: moving to a different code environment with more recent package solved my issue.
Cheers