Issue using BERTopic with umap in a Jupyter notebook

June Dataiku DSS Core Designer, Registered Posts: 19 ✭✭✭✭
edited July 16 in Using Dataiku

An issue occurs when passing the umap model to the BERTopic class.

I am trying to duplicate this code:

I get this error in my Jupyter Notebook:

TypeError                                 Traceback (most recent call last)
<ipython-input-2-2c902ae25352> in <module>
      1 umap_model = UMAP(n_neighbors=15, n_components=5, 
      2                   min_dist=0.0, metric='cosine', random_state=42)
----> 3 topic_model = BERTopic(umap_model=umap_model)

TypeError: __init__() got an unexpected keyword argument 'umap_model'

I am using a custom code env with these packages installed:

  • bertopic==0.3.4
  • umap-learn==0.5.3

My complete code is this:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
from bertopic import BERTopic
from umap import UMAP

umap_model = UMAP(n_neighbors=15, n_components=5, 
                  min_dist=0.0, metric='cosine', random_state=42)
topic_model = BERTopic(umap_model=umap_model)

This code (without the dataiku pkgs) works in an external IDE.

Operating system used: Windows


Best Answer

  • shashank
    shashank Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 27 Dataiker
    Answer ✓

    umap_model attribute is added in Bertopic version >=0.10.

    I tested the code with bertopic==0.12.0, this should work.


Setup Info
      Help me…