Welcome to the thirteenth Community Conundrum! For this week - let’s open things up with a more flexible question.
Provided is a dataset with detailed information on a huge number of football players from the FIFA database. The players salary is listed, as well as a *long* list of skills with each player rated out of 100, and a few details around the players position and physical condition.
Can you create a flow that outputs a value proposition that compares each player to their wage?
This is of course highly subjective - do you care about certain positions more? What skill should have priority? We leave these up to you!
If there's one thing similar about an interesting dataset and a good football's match, is that they're both good at keeping everyone's safe at home during this time of the pandemic. And in all honesty, I'm no data-scientist, nor a dev guru. I'm just recently got myself exposed to Machine Learning and Artificial Intelligent in general, while doing them in Dataiku, somewhere a little over then 3 months ago, so here's my take to their FIFA conundrum's challenge.
And since the challenge is not to 'predict' anything, rather to group/cluster the player's skillsets in reflect to their wages rate.
- Here's what my current flow would look like, and don't bother much on the 2 additional datasets, as they're merely exported from the existing model, so that I may explore them further later on.
- And here's how I go about on the prepare recipes, nothing out of the ordinary. Just converting categorical to numerical values, and filling up the 'NaN' with median values, while grouping them to have better clarity, if ever I need to go back and revise anything again in the future.
- While on the modeling/training steps, I chose the 'Interactive Clustering' which in return, delivered me a sufficient scoring value.
- On to the clustering variables name, I simply identify them in the grading manner, starting from 'Grading A', as the most top-knot performer, all the way down to the least performing one marked with 'Grading E'.
- And here's how my cluster-plot would look like, obviously the better the grade, the least volume of players getting included in them.
Acceleration x Wage
Sliding Tackle x Wage
- And for sure, those who sit at the 'Grading A' level would stand above the average threshold measurements (though, that's not always the case with other included variables, which I'm about to show you down below).
- And coming back again to the initial question, "creating a flow that outputs a value proposition in term of their wages". I think I didn't include the players name and their nationalities in my modeling for a couple of reasons. In my opinions, those two variables are just way too subjective to get included. In a sense, you could be a top-knot player, regardless of what your 'Names' would sound like, and of course your 'Nationalities'.
So I've done the DSS flow diagram, while the followings are my list of 'values proposition' that contributed of being one 'Grading-A' player in the field.
Top 5 Values Proposition
Top 5 Values Proposition By Distribution.
Top 5 Values Proposition By Grade.
Been enjoying exploring this dataset for sure, and certainly it was fun doing it, stays safe everyone! 😊