Today’s conundrum is the second in our ‘Fashion Reviews’ series. Today we take a look at the reviews we cleaned up last time and aim to analyse their sentiment. With a little luck we can begin to identify some common themes.
What jumps out at you from each cloud? Are there any insights you can draw on what things people like and what might need improvement?
For bonus points: try making your word cloud code (R or Python) into a plugin!
PS: Need more practice using the Sentiment analysis plugin? Give Conundrum 18 a go!
Katie Gross did a nice presentation on NLP last week. However, I'm having difficulties finding the Word Cloud Plugin.
I found this on Git Hub.
But it does not look complete.
And this Github repo for the word cloud plugin is empty at this time.
The NLP BrightTalk presentation of earlier this week references a word cloud plugin at
@MichaelG I could build a shiny app to do a word cloud. However, that feels like more work than I want to do for a Conundrum.
Hi @tgb417 ,
There is no specific plugin for word clouds in DSS as of today. What @MichaelG was suggesting was to write your own code (in R or Python) that would, for example, save the word cloud image in a managed folder. You could then turn that code into a plugin (could be useful for your 'non-coder' colleagues or even yourself if you think you will want to generate more word clouds in the future!).
You might want to check the wordcloud package (same name for R and Python). A wordcloud can be easily generated in very few lines of code using both!
Thanks for the insight.
Here is a first attempt at a word cloud from the data. This does not answer the whole conundrum. This is from all of the words by the number of occurrences, not sentiment segregated words.
@AnitaC I don't see a lot about creating a chart type plugin in the documentation. Saving the image files to a folder does not seem to be very DSS like way to do this. I know there is a way to package a chart into a plugin, because there are other charts in the app store. However, I'm not seeing a lot of documentation on setting this up. Can you point me to the details?
P.S. A big Thanks to Katie Gross of Dataiku for a lot of help here.
Here are a:
However, there is a huge overlap. I'm starting to think about methods to maybe pull words that just show up in the positive word cloud and those that show up in the negative word cloud.
Here are two more.
In these word clouds, we are removing all of the posts for which we were unsure if the post was positive or negative.
That's all for me for now.
So, today I've been playing with this a bit.
I ended up building a logistic regression model that would predict what words (and tri-grams and bi-grams) drive a Positive or Negative review.
In the visualizations below I am weighting the size of the words by the regression coefficients of that model.
I'd like to invite others to jump in here with some further ideas.
In looking at the negative phrases I'm seeing all of these phrases like
In all of these cases I'd really like to know what follows these most indicative phrases about a potential problem. Have not figured out a way to pull this information yet. Anyone got an idea?