Today we have a pure modelling challenge for you!
Attached is a large dataset of bookings at two hotels. One in a major city, the other located in a resort area. Ideas may immediately come to mind about things that would differ between these hotels - but how well can your intuitions (combined with Dataiku DSS’s powerful modelling abilities) combine to predict which hotel each booking is for?
Try your hand at the puzzle by using the ‘hotel’ column as the target and building the best model you can!
Thanks to Jesse Mostipak over on Kaggle for sharing this dataset and allowing its use under a creative comments licence. Note that we have made edits to the dataset to enable you to spend less time cleaning and more time modelling!
This is a lovely dataset for model building.
Classification of the Hotel Column is fairly straight forward. I used a flow like this to prep and Test my model.
When it comes to features there were several useful and fun features that one can create.
I created a feature looking at whether there was a difference between reserved_room_type and assigned_room_type called reserved_assigned_different. If the values were not the same the visit would get a 1 in this new column.
I also used Feature generation to quickly experiment with Pairwise polynomial combinations.
I found a number of options that interested me. Maybe my favorite is adults * children. This makes it explicit if you have a family situation.
I also used the auto feature reduction options. Here is a look at some of the top Variable Importance.
Here are some of the results I was getting during Model Training.
And when I ran the model on unseen data, I'm getting the following results.
How have others done? What kinds of features are you finding to be interesting and useful? Have you been able to do better in your scores?
I'd like to invite you to share your results so that we can all learn.