I was plotting some data of a webhop sales order dataset. When I plotted the totals by week, there was one week that sorely stood out of all the other weeks in 4 years of time, which was week 1 of 2016. The total amount was a little more than double of the second highest week and thus a huge peak compared to the others in the graph.
I immediately classified this graph point as an outlier so I started investigating this particular week by narrowing the graph down to this week. Surprisingly, when I explored this week ordered by day, no outlier or total amount was equal to that given the graph. So then there must've been some other error. When I took a few more dates before and after this week, I noticed that week 1 was growing the more days I added before week 1 as stated by the calendars (January 4th until January 10th).
It was then that I noticed that week 52 of 2015 only appeared as soon as I included December 27th. When I went back to the limit of December 28th until January 10th, all data was classified as week 1. It was then that I noticed December of 2015 had a week 53.
So there we have it. The 53th of December 2015, which is December 28th until January 3th is grouped with week 1, skewing my data graph. But I can't throw the sales of this week because it is relevant and true data. The high peak and the week being outlier now isn't that surprising because between December and February are the busiest weeks for this organisation.
How am I supposed to deal with this 53th week so my graph is correct without throwing away the data of the 53th week? Am I doing something wrong or is deleting these rows of a week of sales really the only solution?
Edit, chart grouped by week:
Thanks in advance,
Thank you for sending a sample of the data. I was able to reproduce the issue. It is a bug, which I reported to our R&D team.
In the meantime let me suggest the following workaround: add a Python processing step in a Prepare recipe with the following code:
from datetime import datetime
python_date = datetime.strptime(row["created_at"], "%Y-%m-%dT%H:%M:%S.000Z")
year = python_date.isocalendar()
weeknumber = python_date.isocalendar()
return("%s-W%s" % (year, weeknumber))
This will create a new column with the expected year and week number, which you can use for aggregation in the charts.
Here is a screenshot if that helps: