Aww.... what a wowzer! I can't wait to try to figure this one out!
I've taken a first go at breaking this down, the first job was to parse the date column into an actual date, then I broke out the individual date elements - day, month and year:
I selected the month column as my target, interestingly the automated setup initially suggested this should be a regression problem, which I then changed to multi-class classification.
I trained this first dataset using the Decision Tree, Logistic Regression and Random Forest Algorithms, with the later winning on ROC score:
Now for my favourite part of any modelling process, feature importance!
Suggesting that the max temperature is the best indicator for a month, makes sense, right? White interestingly rainfall is actually not a great indicator for month.
Here's a nice visualisation of the Random Forest split, starting with mean temperature:
Would anyone like to suggest some ways to refine this very simply start?
Ben