We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Conundrum 2: Holiday Hoodies

MichaelG
Community Manager
Community Manager
Conundrum 2: Holiday Hoodies

Generic Community Conundrums - header for posts2 (1).png

The second Conundrum is here!

A company has separate datasets for the 2016 and 2015 numbers and needs to compare a subset of 2016’s numbers with the numbers from 2015.

Given the two datasets "orders_2015" and "orders_2016", create a single visualization that answers the question, "Did we sell more Hoodies in 2016 than in 2015 in the first 25 days of December?"

Good luck - I hear there are stacks of clever ways to go about it!

Once you have your answer please export your project and upload it here so we can all learn from each other's efforts! Refer to our Submission Guidelines to see how to properly export your project.

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
7 Replies
anita-clmnt
Level 3

I stacked the two datasets, kept all the rows where order_date_month=12, order_date_day was between 1 and 25 and tshirt_category='Hoodie'. Then I created a 'sales' column equal to tshirt_price*tshirt_quantity.

For the chart I decided to draw two barplots, one representing the cumulative sum of 'sales' by day and the other one representing the cumulative sum of quantities by day with two different colors for each year.

The 2 last bars thus represent the overall result for the 25 first days of December.

I hope I helped!

ben_p
Neuron
Neuron

I followed a similar path to Anita, importing the two .csv files into a single dataset then grouping and pivoting the results by year. I used the quantity of hoodies sold rather than the revenue in Antia's solution - I think revenue is a better measure though!

I created a chart showing sales by day for December only, year-over-year, however I found this didn't provide the answer really quickly, so I also made a pivot table where I summed up the sales by year and displayed the totals (which I won't post here).

ben_p_0-1587023419996.pngben_p_1-1587023466650.png

 

 

MichaelG
Community Manager
Community Manager
Author

@anita-clmnt @ben_p 

Thank you both for your submissions! I love that you each took a similar yet different path to reach the goal! 

Thanks for taking part and I look forward to seeing more submissions from you both in future 😀

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
ValerieM
Dataiker Alumni

I determined that, yes, we sold more hoodies Dec 1-25, 2016 (162 hoodie orders) than in Dec 1-25, 2015 (156 hoodie orders). Since the price of hoodie did not change year to year, the revenue in 2015 is also less given fewer orders.

I stacked the datasets to create a single dataset of all orders. I created all the filtering in the chart itself to isolate the dates of interest. 

Liev
Dataiker Alumni

hi @MichaelG another one, this is probably a more straight forward one.

As others, stack, filter for December and days <= 25, then build visualisations.

I built two, one aggregate and one cumulative per day.

One small trick I used was to upload both files into the same dataset, this saves me a stack recipe if the structures are the same already!

taraku
Dataiker
Dataiker

@Liev Nice trick uploading both files into the same dataset! 😁

MRvLuijpen
Neuron
Neuron

Hello @MichaelG .

Here is my response (hopefully not too late).

I did uploaded both files and filtered for Hoodies, December & 1 - 25. Also created Sales as price * quantity.

Made 3 bar charts:

- Amount of transaction (2015 lower then 2016)

- Sales (2015 higher then 2016)

- Total Quantity (2015 higher then 2016), 

Based on these charts, I conclude that the answer is no (total quantity Hoodies sold in 2016 are less then 2015)