Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Using custom python model for Clustering (Agglomerative Clustering)

Solved!
smp
Level 2
Using custom python model for Clustering (Agglomerative Clustering)

Hi all, 

I have a question regarding custom python models for a clustering modelling task.

I am trying to do something really basic, like running Agglomerative Clustering using a different metric and linkage methods (included in sklearn natively). at the moment, I seem to be unable to do so default model in dataiku, so I tried to build my own custom estimator. I would like to also pass the number of clusters as a parameter of the GUI like other models. Optionally, I would like to pass also the metric and linkage as a parameter of the GUI for testing different models.

I tried to follow this link but I am not able to make it work:

  1. https://doc.dataiku.com/dss/latest/machine-learning/algorithms/in-memory-python.html#custom-models-c... 

 

 

 

from sklearn.cluster import AgglomerativeClustering
        
clf = AgglomerativeClustering(n_clusters = n_clusters , metric='cosine', linkage='complete')

 

 

 

I don't understand why, but this code doesn't work:

 

 

 

Failed to train : <class 'TypeError'> : __init__() got an unexpected keyword argument 'metric'
[2024/05/28-09:57:37.906] [MRT-1234331] [INFO] [dku.block.link.interaction]  - Check result for nullity exceptionIfNull=true result=null
Traceback (most recent call last):
  File "/opt/dataiku/python/dataiku/doctor/server.py", line 45, in serve
    ret = api_command(arg)
  File "/opt/dataiku/python/dataiku/doctor/dkuapi.py", line 46, in aux
    return api(**kwargs)
  File "/opt/dataiku/python/dataiku/doctor/commands.py", line 673, in train_clustering_models_nosave
    pipeline)
  File "/opt/dataiku/python/dataiku/doctor/clustering_entrypoints.py", line 16, in clustering_train_score_save
    (clf, actual_params, cluster_labels, additional_columns) = clustering_fit(modeling_params, transformed_src)
  File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 108, in clustering_fit
    clf = clustering_model_from_params(modeling_params, len(train.index))
  File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 39, in clustering_model_from_params
    return scikit_model(modeling_params)
  File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 22, in scikit_model
    exec(code, ctx)
  File "<string>", line 3, in <module>
TypeError: __init__() got an unexpected keyword argument 'metric'
[2024-05-28 07:57:37,907] [11/MainThread] [INFO] [dataiku.base.socket_block_link] Client closed

 

 

 

but this works in a Juputer notebook (see attachment)

Could anyone support me please?

Thanks!

0 Kudos
1 Solution
AdrienL
Dataiker

It seems the metric parameter was added in scikit-learn 1.2. So my guess is that the code environment you are using for the notebook has sklearn >= 1.2 but the one you are using on the Lab ML Task has an earlier version of sklearn.

You must use a code environment that  both has the recommended packages for Visual ML and for which the version of sklearn is at least 1.2. Note that support for sklearn 1.2 in Visual ML was introduced in Dataiku 12.5.0, so you'll need to be at least on this version.

View solution in original post

0 Kudos
2 Replies
AdrienL
Dataiker

It seems the metric parameter was added in scikit-learn 1.2. So my guess is that the code environment you are using for the notebook has sklearn >= 1.2 but the one you are using on the Lab ML Task has an earlier version of sklearn.

You must use a code environment that  both has the recommended packages for Visual ML and for which the version of sklearn is at least 1.2. Note that support for sklearn 1.2 in Visual ML was introduced in Dataiku 12.5.0, so you'll need to be at least on this version.

0 Kudos
smp
Level 2
Author

Thanks a lot for the timely response!

Indeed my issue was exactly that: moving to a different code environment with more recent package solved my issue. 

Cheers

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku