Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Conundrum 22: Hotel Bookings

Dataiker Alumni
Conundrum 22: Hotel Bookings

Generic Community Conundrums - header for posts19 (1).png

Today we have a pure modelling challenge for you! 

Attached is a large dataset of bookings at two hotels. One in a major city, the other located in a resort area. Ideas may immediately come to mind about things that would differ between these hotels - but how well can your intuitions (combined with Dataiku DSS’s powerful modelling abilities) combine to predict which hotel each booking is for? 

Try your hand at the puzzle by using the ‘hotel’ column as the target and building the best model you can! 

Thanks to Jesse Mostipak over on Kaggle for sharing this dataset and allowing its use under a creative comments licence. Note that we have made edits to the dataset to enable you to spend less time cleaning and more time modelling! 

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
1 Reply

This is a lovely dataset for model building.

Classification of the Hotel Column is fairly straight forward.  I used a flow like this to prep and Test my model.

Hotel Model.jpg

When it comes to features there were several useful and fun features that one can create.

I created a feature looking at whether there was a difference between reserved_room_type and assigned_room_type called reserved_assigned_different.  If the values were not the same the visit would get a 1 in this new column.



I also used Feature generation to quickly experiment with Pairwise polynomial combinations.

I found a number of options that interested me.  Maybe my favorite is adults * children.  This makes it explicit if you have a family situation.

Do we have a famlie.Do we have a famlie.


I also used the auto feature reduction options.  Here is a look at some of the top Variable Importance.

Variable Importance.jpg


Here are some of the results I was getting during Model Training.

Model Training.jpg


And when I ran the model on unseen data, I'm getting the following results.

Hotel Results.jpg


How have others done?  What kinds of features are you finding to be interesting and useful?  Have you been able to do better in your scores? 

I'd like to invite you to share your results so that we can all learn.