Flight delays - frustrating for sure but are they predictable?
Attached is a dataset of just under 400,000 domestic US flights. Each flight has some useful data around origin, destination, time of departure, and (vitally for our purposes) if the flight has a 15 minute or more arrival delay.
Can you use all this data to predict if a flight will be delayed? Share your most significant features in the comments!
PS: before you open the data, ask yourself, what percent do you think will have a 15 minute or more delay? Test your intuition before you test your DSS skills!
10% is a pretty good guess though!
With the imbalanced target Class. Do we think it advisable to actually use all of the data? It took my little 2 core laptop between 16 and 50 minutes to computer some of the basic models. I'm going to start out with a sample for some of my early models in order to decrease compute time for each trial.