Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Conundrum 22: Hotel Bookings

Community Manager
Community Manager
Conundrum 22: Hotel Bookings

Generic Community Conundrums - header for posts19 (1).png

Today we have a pure modelling challenge for you! 

Attached is a large dataset of bookings at two hotels. One in a major city, the other located in a resort area. Ideas may immediately come to mind about things that would differ between these hotels - but how well can your intuitions (combined with Dataiku DSS’s powerful modelling abilities) combine to predict which hotel each booking is for? 

Try your hand at the puzzle by using the ‘hotel’ column as the target and building the best model you can! 

Thanks to Jesse Mostipak over on Kaggle for sharing this dataset and allowing its use under a creative comments licence. Note that we have made edits to the dataset to enable you to spend less time cleaning and more time modelling! 

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question.
1 Reply
Neuron
Neuron

This is a lovely dataset for model building.

Classification of the Hotel Column is fairly straight forward.  I used a flow like this to prep and Test my model.

Hotel Model.jpg

When it comes to features there were several useful and fun features that one can create.

I created a feature looking at whether there was a difference between reserved_room_type and assigned_room_type called reserved_assigned_different.  If the values were not the same the visit would get a 1 in this new column.

Reserved_Assign_Difference.jpg

 

I also used Feature generation to quickly experiment with Pairwise polynomial combinations.

I found a number of options that interested me.  Maybe my favorite is adults * children.  This makes it explicit if you have a family situation.

Do we have a famlie.Do we have a famlie.

 

I also used the auto feature reduction options.  Here is a look at some of the top Variable Importance.

Variable Importance.jpg

 

Here are some of the results I was getting during Model Training.

Model Training.jpg

 

And when I ran the model on unseen data, I'm getting the following results.

Hotel Results.jpg

 

How have others done?  What kinds of features are you finding to be interesting and useful?  Have you been able to do better in your scores? 

I'd like to invite you to share your results so that we can all learn.

 

--Tom