Community Conundrum 25: Feature Visualization is now live! Read More

Conundrum 20: Fashion Reviews - What's in a word?

Community Manager
Community Manager
Conundrum 20: Fashion Reviews - What's in a word?

Generic Community Conundrums - header for posts15.png

Today’s conundrum is the second in our ‘Fashion Reviews’ series. Today we take a look at the reviews we cleaned up last time and aim to analyse their sentiment. With a little luck we can begin to identify some common themes.

Using the cleaned data you generated when completing Conundrum 17 and utilising the Sentiment analysis plugin can you create a word cloud for the positive and negative reviews?

What jumps out at you from each cloud? Are there any insights you can draw on what things people like and what might need improvement?

For bonus points: try making your word cloud code (R or Python) into a plugin!

Good luck! 

 

PS: Need more practice using the Sentiment analysis plugin? Give Conundrum 18 a go!

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?
8 Replies
Level 6

Katie Gross did a nice presentation on NLP last week.  However, I'm having difficulties finding the Word Cloud Plugin.

I found this on Git Hub.

https://github.com/dataiku/dss-plugin-nlp-visualization

But it does not look complete.

And this Github repo for the word cloud plugin is empty at this time.

https://github.com/dataiku/dss-plugin-wordcloud

The NLP BrightTalk presentation of earlier this week references a word cloud plugin at 

https://www.brighttalk.com/webcast/17108/430154?

WordCloud Plugin.jpg

 

@MichaelG I could build a shiny app to do a word cloud.  However, that feels like more work than I want to do for a Conundrum.

--Tom
Dataiker
Dataiker

Hi @tgb417 ,

There is no specific plugin for word clouds in DSS as of today. What @MichaelG was suggesting was to write your own code (in R or Python) that would, for example, save the word cloud image in a managed folder. You could then turn that code into a plugin (could be useful for your 'non-coder' colleagues or even yourself if you think you will want to generate more word clouds in the future!).

You might want to check the wordcloud package (same name for R and Python). A wordcloud can be easily generated in very few lines of code using both!


Level 6

Thanks for the insight.

Here is a first attempt at a word cloud from the data.  This does not answer the whole conundrum.  This is from all of the words by the number of occurrences, not sentiment segregated words.  

WordCloud.jpg

 @AnitaC I don't see a lot about creating a chart type plugin in the documentation.  Saving the image files to a folder does not seem to be very DSS like way to do this.  I know there is a way to package a chart into a plugin, because there are other charts in the app store.  However, I'm not seeing a lot of documentation on setting this up.  Can you point me to the details?

P.S. A big Thanks to Katie Gross of Dataiku for a lot of help here.

--Tom
0 Kudos
Level 6

Here are a:

Word in Positive reviews

size by the number of times the word appears

Positive Cloud.jpg

Word in Negative reviews

size by the number of times the word appears

Negitive Cloud.jpg

 

However, there is a huge overlap.  I'm starting to think about methods to maybe pull words that just show up in the positive word cloud and those that show up in the negative word cloud.  

 

--Tom
0 Kudos
Level 6

Here are two more.

Words in Positive Posts

with predicted confidence of positive post >= .95
n = 15,009 posts

Positive greater than 95 confidence.jpg

 

Words in Negative Posts

with predicted confidence of negative post of >= .95
n = 3,332 posts

Negitive greater than 95 confidence v2.jpg

In these word clouds, we are removing all of the posts for which we were unsure if the post was positive or negative.  

That's all for me for now.

--Tom
0 Kudos
Level 6

So, today I've been playing with this a bit.  

I ended up building a logistic regression model that would predict what words (and tri-grams and bi-grams) drive a Positive or Negative review. 

In the visualizations below I am weighting the size of the words by the regression coefficients of that model.

Most Positive Words 

by regression coefficients.

Positive Words by Coefficent.jpg

 

Most Negative Words 

by regression coefficients.

Negitive Words by Coefficent.jpg

 

I'd like to invite others to jump in here with some further ideas.

 

--Tom
0 Kudos
Level 6

In looking at the negative phrases I'm seeing all of these phrases like 

  • really wanted ____________ 
  • beautiful but _____________
  • cute but ___________
  • not worth ___________

In all of these cases I'd really like to know what follows these most indicative phrases about a potential problem.  Have not figured out a way to pull this information yet.  Anyone got an idea?

--Tom
0 Kudos
Dataiker
Dataiker

Thanks for reminding me about the sentiment analysis plugin!

0 Kudos