Community Conundrum 25: Feature Visualization is now live! Read More

Sentence Embedding - Machine/Deep Learning Model

Level 2
Sentence Embedding - Machine/Deep Learning Model

Using a python 2.7 code environment I was able to use the macro's pre-trained word embeddings and create sentence embeddings using the plug in for a column in my dataset which is a corpus of text data. What I'm trying to do is figure out how I can use DSS to classify the text data in that column by using the sentence embeddings. The sentence embeddings are put in a different column. I would also like to map back to the actual words when the clustering is complete so I can get text analysis. 

What I tried doing was use the available K-means algorithm  on the sentence embedding column to create clusters of data but I often got list of of index errors during fit of the model. 

Can you help me with advice on how to use sentence embeddings plugin out with existing deep/machine learning algorithms. 

Thanks,

3 Replies
Dataiker
Dataiker

Hi,

In order to use the embedding column for Machine Learning / Deep Learning models, you can choose the "Vector" feature handling, as shown below:

Screenshot 2020-05-07 at 01.56.00.png

Hope it helps,

Alex

Level 2
Author

Thanks Alex for that information, it really helps. 

Right now I have the cluster number and cluster ID located in a different column how do I convert the vectors back to words so the cluster can be the string of word instead? 

For example, instead of cluster ID I want it to display the topics. 

Thanks, 

Vinh

Dataiker
Dataiker

Hi,

The plugin outputs vectors at the sentence level, so there's no direct way to map it back to words. 

If your use case is about understanding clusters of documents, I would suggest using the 'Topic modeling' predefined notebook: https://doc.dataiku.com/dss/latest/notebooks/predefined-notebooks.html

Best regards,

Alex