Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Todayโs conundrum is the second in our โFashion Reviewsโ series. Today we take a look at the reviews we cleaned up last time and aim to analyse their sentiment. With a little luck we can begin to identify some common themes.
Using the cleaned data you generated when completing Conundrum 17 and utilising the Sentiment analysis plugin can you create a word cloud for the positive and negative reviews?
What jumps out at you from each cloud? Are there any insights you can draw on what things people like and what might need improvement?
For bonus points: try making your word cloud code (R or Python) into a plugin!
Good luck!โ
PS: Need more practice using the Sentiment analysis plugin? Give Conundrum 18 a go!
Katie Gross did a nice presentation on NLP last week. However, I'm having difficulties finding the Word Cloud Plugin.
I found this on Git Hub.
https://github.com/dataiku/dss-plugin-nlp-visualization
But it does not look complete.
And this Github repo for the word cloud plugin is empty at this time.
https://github.com/dataiku/dss-plugin-wordcloud
The NLP BrightTalk presentation of earlier this week references a word cloud plugin at
https://www.brighttalk.com/webcast/17108/430154?
@MichaelG I could build a shiny app to do a word cloud. However, that feels like more work than I want to do for a Conundrum.
Hi @tgb417 ,
There is no specific plugin for word clouds in DSS as of today. What @MichaelG was suggesting was to write your own code (in R or Python) that would, for example, save the word cloud image in a managed folder. You could then turn that code into a plugin (could be useful for your 'non-coder' colleagues or even yourself if you think you will want to generate more word clouds in the future!).
You might want to check the wordcloud package (same name for R and Python). A wordcloud can be easily generated in very few lines of code using both!
Thanks for the insight.
Here is a first attempt at a word cloud from the data. This does not answer the whole conundrum. This is from all of the words by the number of occurrences, not sentiment segregated words.
@AnitaC I don't see a lot about creating a chart type plugin in the documentation. Saving the image files to a folder does not seem to be very DSS like way to do this. I know there is a way to package a chart into a plugin, because there are other charts in the app store. However, I'm not seeing a lot of documentation on setting this up. Can you point me to the details?
P.S. A big Thanks to Katie Gross of Dataiku for a lot of help here.
Here are a:
However, there is a huge overlap. I'm starting to think about methods to maybe pull words that just show up in the positive word cloud and those that show up in the negative word cloud.
Here are two more.
In these word clouds, we are removing all of the posts for which we were unsure if the post was positive or negative.
That's all for me for now.
So, today I've been playing with this a bit.
I ended up building a logistic regression model that would predict what words (and tri-grams and bi-grams) drive a Positive or Negative review.
In the visualizations below I am weighting the size of the words by the regression coefficients of that model.
I'd like to invite others to jump in here with some further ideas.
In looking at the negative phrases I'm seeing all of these phrases like
In all of these cases I'd really like to know what follows these most indicative phrases about a potential problem. Have not figured out a way to pull this information yet. Anyone got an idea?
I've come back to look at this Conundrum.
I'm wanting to pull out any named entities from these reviews. Things like Dresses, skirts, slacks, tops... And in some ways, I've gotten things like condition sizes, colors, quality. However, I'd like to be able to pull these out more explicitly.
I tried the Named Entity Recognition Plugin however the results using spaCy did not seem to be that good. So, I'm wondering about getting a hold of some of the Flair models. However, I've no figured out
Is anyone familiar with using this plugin able to comment?
Thanks for any help you can share.
cc: @duphan
Thanks for reminding me about the sentiment analysis plugin!