Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Customer Predictive Analytics - Watch on Demand

Community Manager
Community Manager
1 min read 2 1 1,245

On Wednesday, June 3rd, @ben_p (Data Science Manager, MandM Direct) presented his DSS project to predict the likelihood that a customer will return to a website.

The presentation was followed by an intervention from Leo Treguer, Data Scientist at Dataiku, who shared learnings from his experience developing customer churn models. 

Ben explained how his team joined data together from various sources into BigQuery, then used DSS to perform data preparation. This resulted in a machine learning algorithm to assess the probability of a customer returning to the website days after their visit. 

Key takeaways from Ben's presentation:

  • Have a clear goal and put a lot of time/thought into defining the question you are aiming to answer,
  • Think about the final action early on in the process,
  • Spend a lot of time working on features and... let DSS do the hard work!
  • Your input data: check, recheck and check again!

Best practices and pitfalls from Leo's experience:

Best Practices Pitfalls
  • Focusing on a precise context, e.g. only new customers’ churn in their first 2 weeks of usage

  • Creating relevant features on customers

  • Explaining analyses to non-data scientists to maximize impact
  • Not checking input data enough
  • Expecting machines to understand business 
  • Including too many outliers

  • Not involving other stakeholders enough



Ben has been working with data and analytics in the online retail space for 4 years. He leads a team at MandM Direct tasked with extracting maximum value from big data and using it to drive actions that benefit both the business and its customers. Ben has a diverse background, having studied illustration at university and practiced this for a number of years, followed by time in sales and marketing, before applying his passions for data and customer experience to his current role.


What's your experience with Customer Predictive Analytics? Any best practices to share, or thoughts on Ben's and Leo's perspectives? 

1 Comment
Community Manager
Community Manager

We weren't able to answer every question yesterday. Thank you @ben_p for taking the time to answer these questions:

  1. How did you unify the audience ID's? Was that done prior to ingesting data sets into DSS?
    Yes it was, depending on your datasets this can be one of the most complex steps. The first thing we looked at when we explored datasets to join in was the keys - can we actually join this data at custom level, if not then we can’t use it.
  2. How big is the dataset? How long does the model take to run?
    The source dataset is around 5 million rows, we model on a sample of this data. Finding the right sample size took a little trial and error to ensure we train on enough examples while keeping training as efficient as possible. Training takes about 50 minutes with XGBoost, once the features are reduced to the most impactful.
  3. How did you deploy the model?
    We run an automation node with DSS, so when we are happy with a model we transfer it to this node and schedule it to run with a scenario. DSS makes this process much faster than it used to be!
  4. How important is data sampling when we have unique customer data?
    I’m not sure what you mean by “unique customer data”, but feel free to drop a reply on the community and we can discuss more.

Like Ben said, please feel to ask any other questions you may have below in this thread.