Conundrum 9: Month Forecasting

MichaelG · ‎05-25-2020

Generic Community Conundrums - header for posts25.png

Welcome to Conundrum 9 - We know your love for modeling, so here's another dataset for you to get stuck into!

Attached is data pertaining to various weather conditions in a certain location for every day of the year 2019. This data includes minimum and maximum temperature, precipitation levels, windspeed and more!

Can you build a model that uses this data to predict the month in which each point was collected? Some important features are likely to be obvious to you - but let's see what the model can find! Bear in mind that while we have prepared the data somewhat 'month' currently isn't a column - so you will need to get stuck into data prep yourself.

Of course building the model is just the start - refining is key. Share your methods and most significant features here and together we can reach new heights!

Note: A value of 99.9, 999.9, and 9999,9 indicate a missing reading, don't go thinking there was 999cm of snow almost every day that year!

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

taraku · ‎06-01-2020

Aww.... what a wowzer! I can't wait to try to figure this one out!

ben_p · ‎06-11-2020

I've taken a first go at breaking this down, the first job was to parse the date column into an actual date, then I broke out the individual date elements - day, month and year:

I selected the month column as my target, interestingly the automated setup initially suggested this should be a regression problem, which I then changed to multi-class classification.

I trained this first dataset using the Decision Tree, Logistic Regression and Random Forest Algorithms, with the later winning on ROC score:

Now for my favourite part of any modelling process, feature importance!

Suggesting that the max temperature is the best indicator for a month, makes sense, right? White interestingly rainfall is actually not a great indicator for month.

Here's a nice visualisation of the Random Forest split, starting with mean temperature:

Would anyone like to suggest some ways to refine this very simply start?

Ben

Sign up to take part

Conundrum 9: Month Forecasting

Conundrum 9: Month Forecasting