Issue using BERTopic with umap in a Jupyter notebook

Tags
Dataiku DSS Core Designer, Registered Posts: 20 ✭✭✭✭✭
edited July 2024 in Using Dataiku

An issue occurs when passing the umap model to the BERTopic class.

I am trying to duplicate this code:

https://maartengr.github.io/BERTopic/faq.html#why-are-the-results-not-consistent-between-runs

I get this error in my Jupyter Notebook:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-2c902ae25352> in <module>
      1 umap_model = UMAP(n_neighbors=15, n_components=5, 
      2                   min_dist=0.0, metric='cosine', random_state=42)
----> 3 topic_model = BERTopic(umap_model=umap_model)

TypeError: __init__() got an unexpected keyword argument 'umap_model'

I am using a custom code env with these packages installed:

  • bertopic==0.3.4
  • umap-learn==0.5.3

My complete code is this:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
from bertopic import BERTopic
from umap import UMAP

umap_model = UMAP(n_neighbors=15, n_components=5, 
                  min_dist=0.0, metric='cosine', random_state=42)
topic_model = BERTopic(umap_model=umap_model)

This code (without the dataiku pkgs) works in an external IDE.


Operating system used: Windows

Best Answer

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 28 Dataiker
    Answer ✓

    umap_model attribute is added in Bertopic version >=0.10.

    I tested the code with bertopic==0.12.0, this should work.

Answers

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.