In this online event, Katie Gross (Lead Data Scientist, Dataiku) walked through a project built in Dataiku DSS using NLP on data from popular cooking sites, to predict whether a recipe is likely to be highly rated.
Presentation abstract: Want to judge whether your recipe will be a hit? Or in general, what user-generated content is likely to lead to high engagement? We developed a workflow in Dataiku DSS that uses NLP to predict which recipes are likely to be highly rated. We’ll walk you through how we webscraped text recipes from popular recipe-sharing sites like Allrecipes and Epicurious, cleansed and prepared the data, and built a machine learning model to predict ratings of future recipes.
NLP is a field of AI that enables machines to read, understand, and derive meaning from human languages.
Utilizing a Text Featurization Pipeline to convert text into features of a machine learning model: this includes Preprocess Text (normalize, remove stop words, stem, and tokenize) so "I was running to the river and jumped over a log" is processed to ["i", "run", "river", "jump", "log"].
Vectorizing the text (converting to numeric features) utilizing either Count Vectorization or Term Frequency-Inverse Document Frequency (TF-IDF).
Deep dive of recipe reviews in Dataiku DSS.
Katie Gross is a Data Scientist at Dataiku, where she helps clients across industries develop AI solutions using Dataiku DSS. Previously, she worked as a data scientist at a marketing science firm, Schireson and spent several months as a freelance data scientist. Prior to her data science life, Katie spent three years as a CPG consultant at Nielsen. Katie holds a BA in Economics from Colgate University.
Any questions on the presentation? Resources to share on NLP? Feel free to continue the discussion below!