Conundrum 20: Fashion Reviews - What's in a word?

Community Manager
Community Manager
Conundrum 20: Fashion Reviews - What's in a word?

Generic Community Conundrums - header for posts15.png

Today’s conundrum is the second in our ‘Fashion Reviews’ series. Today we take a look at the reviews we cleaned up last time and aim to analyse their sentiment. With a little luck we can begin to identify some common themes.

Using the cleaned data you generated when completing Conundrum 17 and utilising the Sentiment analysis plugin can you create a word cloud for the positive and negative reviews?

What jumps out at you from each cloud? Are there any insights you can draw on what things people like and what might need improvement?

For bonus points: try making your word cloud code (R or Python) into a plugin!

Good luck! 


PS: Need more practice using the Sentiment analysis plugin? Give Conundrum 18 a go!

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
9 Replies

Katie Gross did a nice presentation on NLP last week.  However, I'm having difficulties finding the Word Cloud Plugin.

I found this on Git Hub.

But it does not look complete.

And this Github repo for the word cloud plugin is empty at this time.

The NLP BrightTalk presentation of earlier this week references a word cloud plugin at

WordCloud Plugin.jpg


@MichaelG I could build a shiny app to do a word cloud.  However, that feels like more work than I want to do for a Conundrum.

Dataiker Alumni

Hi @tgb417 ,

There is no specific plugin for word clouds in DSS as of today. What @MichaelG was suggesting was to write your own code (in R or Python) that would, for example, save the word cloud image in a managed folder. You could then turn that code into a plugin (could be useful for your 'non-coder' colleagues or even yourself if you think you will want to generate more word clouds in the future!).

You might want to check the wordcloud package (same name for R and Python). A wordcloud can be easily generated in very few lines of code using both!


Thanks for the insight.

Here is a first attempt at a word cloud from the data.  This does not answer the whole conundrum.  This is from all of the words by the number of occurrences, not sentiment segregated words.  


 @AnitaC I don't see a lot about creating a chart type plugin in the documentation.  Saving the image files to a folder does not seem to be very DSS like way to do this.  I know there is a way to package a chart into a plugin, because there are other charts in the app store.  However, I'm not seeing a lot of documentation on setting this up.  Can you point me to the details?

P.S. A big Thanks to Katie Gross of Dataiku for a lot of help here.

0 Kudos

Here are a:

Word in Positive reviews

size by the number of times the word appears

Positive Cloud.jpg

Word in Negative reviews

size by the number of times the word appears

Negitive Cloud.jpg


However, there is a huge overlap.  I'm starting to think about methods to maybe pull words that just show up in the positive word cloud and those that show up in the negative word cloud.  


0 Kudos

Here are two more.

Words in Positive Posts

with predicted confidence of positive post >= .95
n = 15,009 posts

Positive greater than 95 confidence.jpg


Words in Negative Posts

with predicted confidence of negative post of >= .95
n = 3,332 posts

Negitive greater than 95 confidence v2.jpg

In these word clouds, we are removing all of the posts for which we were unsure if the post was positive or negative.  

That's all for me for now.

0 Kudos

So, today I've been playing with this a bit.  

I ended up building a logistic regression model that would predict what words (and tri-grams and bi-grams) drive a Positive or Negative review. 

In the visualizations below I am weighting the size of the words by the regression coefficients of that model.

Most Positive Words 

by regression coefficients.

Positive Words by Coefficent.jpg


Most Negative Words 

by regression coefficients.

Negitive Words by Coefficent.jpg


I'd like to invite others to jump in here with some further ideas.


0 Kudos

In looking at the negative phrases I'm seeing all of these phrases like 

  • really wanted ____________ 
  • beautiful but _____________
  • cute but ___________
  • not worth ___________

In all of these cases I'd really like to know what follows these most indicative phrases about a potential problem.  Have not figured out a way to pull this information yet.  Anyone got an idea?

0 Kudos

I've come back to look at this Conundrum.

I'm wanting to pull out any named entities from these reviews.  Things like Dresses, skirts, slacks, tops...  And in some ways, I've gotten things like condition sizes, colors, quality.  However, I'd like to be able to pull these out more explicitly.

I tried the Named Entity Recognition Plugin however the results using spaCy did not seem to be that good.  So, I'm wondering about getting a hold of some of the Flair models.  However, I've no figured out

  • How to get a hold of the Flair model files
  • Which model files to use.
    • There seem maybe to be a bunch of these model files, and maybe even ones designed for review parsing.  

Is anyone familiar with using this plugin able to comment?

Thanks for any help you can share.

cc: @duphan 

0 Kudos

Thanks for reminding me about the sentiment analysis plugin!

0 Kudos