Using custom python model for Clustering (Agglomerative Clustering)

smp
smp Registered Posts: 4 ✭✭✭
edited July 16 in Using Dataiku

Hi all,

I have a question regarding custom python models for a clustering modelling task.

I am trying to do something really basic, like running Agglomerative Clustering using a different metric and linkage methods (included in sklearn natively). at the moment, I seem to be unable to do so default model in dataiku, so I tried to build my own custom estimator. I would like to also pass the number of clusters as a parameter of the GUI like other models. Optionally, I would like to pass also the metric and linkage as a parameter of the GUI for testing different models.

I tried to follow this link but I am not able to make it work:

  1. https://doc.dataiku.com/dss/latest/machine-learning/algorithms/in-memory-python.html#custom-models-clustering

from sklearn.cluster import AgglomerativeClustering
        
clf = AgglomerativeClustering(n_clusters = n_clusters , metric='cosine', linkage='complete')

I don't understand why, but this code doesn't work:

Failed to train : <class 'TypeError'> : __init__() got an unexpected keyword argument 'metric'
[2024/05/28-09:57:37.906] [MRT-1234331] [INFO] [dku.block.link.interaction]  - Check result for nullity exceptionIfNull=true result=null
Traceback (most recent call last):
  File "/opt/dataiku/python/dataiku/doctor/server.py", line 45, in serve
    ret = api_command(arg)
  File "/opt/dataiku/python/dataiku/doctor/dkuapi.py", line 46, in aux
    return api(**kwargs)
  File "/opt/dataiku/python/dataiku/doctor/commands.py", line 673, in train_clustering_models_nosave
    pipeline)
  File "/opt/dataiku/python/dataiku/doctor/clustering_entrypoints.py", line 16, in clustering_train_score_save
    (clf, actual_params, cluster_labels, additional_columns) = clustering_fit(modeling_params, transformed_src)
  File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 108, in clustering_fit
    clf = clustering_model_from_params(modeling_params, len(train.index))
  File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 39, in clustering_model_from_params
    return scikit_model(modeling_params)
  File "/opt/dataiku/python/dataiku/doctor/clustering/clustering_fit.py", line 22, in scikit_model
    exec(code, ctx)
  File "<string>", line 3, in <module>
TypeError: __init__() got an unexpected keyword argument 'metric'
[2024-05-28 07:57:37,907] [11/MainThread] [INFO] [dataiku.base.socket_block_link] Client closed

but this works in a Juputer notebook (see attachment)

Could anyone support me please?

Thanks!

Best Answer

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    Answer ✓

    It seems the metric parameter was added in scikit-learn 1.2. So my guess is that the code environment you are using for the notebook has sklearn >= 1.2 but the one you are using on the Lab ML Task has an earlier version of sklearn.

    You must use a code environment that both has the recommended packages for Visual ML and for which the version of sklearn is at least 1.2. Note that support for sklearn 1.2 in Visual ML was introduced in Dataiku 12.5.0, so you'll need to be at least on this version.

Answers

  • smp
    smp Registered Posts: 4 ✭✭✭

    Thanks a lot for the timely response!

    Indeed my issue was exactly that: moving to a different code environment with more recent package solved my issue.

    Cheers

Setup Info
    Tags
      Help me…