Welcome to Conundrum 8! This week we will be investigating a theory about customer behavior.
You are given the Haiku Tshirts Orders dataset. The hypothesis in question is this: returning customers tend to buy more in orders after their first.
Can you build a Flow and create a chart that tests this hypothesis and proves it's truth or falsehood - for these customers at least?
Hint: Some customers have only one order while others may have up to nine orders.If you would like to share your results / chart with the community feel free to do so here either in the form of screenshots, a text based description, a project upload, or all three!
If you would like to upload your project please refer to Submission guidelines.
Hi Corey, can you give us a bit more info on what you mean by this hypothesis? I am inclined to interpret it as "first orders are of a smaller size than subsequent orders". Is that indeed what is meant?
Yes that is what we mean - I'll edit the post to make that a little more clear.
Thanks for pointing out the lack of clarity!
So I had a bit of doubt about the proper statistical way to test the hypothesis. What I ended up doing was to compare, per customer, the size of their first order with the average size of all subsequent orders. This gives me a difference variable, whose mean is not different from 0 according to a simple t-test. The differences are not normally distributed, but considering the large sample size I don't consider that a problem. The project is attached and you can find the t-test in the final dataset.
Hi!!! What a creative way to solve this conundrum! According to your t-test, the difference between the means of the two groups (size of first order vs. size of subsequent orders) is zero. Which would mean, on average, the customer's subsequent orders are not increasing in size. Great job!
Hi @antonstam , looking at the helpful documentation in your project’s Flow, I noticed that multiple recipes could be aggregated into one using the Window recipe. I replicated and extended the hypothesis testing to check the difference between an order as well as each subsequent order, in terms of t-shirt quantity and also total amount (price X quantity).
@taraku 's interpretation of your solution was quite helpful. So I’m quoting and paraphrasing her to report mine:
According to these t-tests, the difference between the means of the two groups (size and amount of first order vs. all other orders as well as each order vs the subsequent one) cannot be differentiated from zero. Which would mean, on average, the customer's subsequent order quantities and amounts are not increasing in size, including from the first order to the rest or in each subsequent order.
Here is what the Flow looks like and a screenshot of the Window recipe. Let me know what you think.
Hey Yashas, that's another great solution. I have a strong background in SQL, where a few JOINs are simpler to write than a WINDOW operation, at least in my experience. So my brain is pretty much wired to think in joins rather than windows, even when I don't have to write SQL code 😉
It's interesting that your approach leads to a much simpler Flow where my approach offers simpler recipes. I would prefer your approach for larger projects where you want to keep the Flow manageable, but for small analyses like these, I'd prefer a more elaborate Flow that offers a lot of information at a glance.