Kicking off my first EDA

4 min read 713 words By Samir Paudel
EDA data-science FIFA

“Three days of coding, one UCL match, and endless coffees later…”

created with ideogram.ai

Well !! I tried combining my love for football with data science….

Disclaimer: I’m not a pro data scientist — yet. Just a football fan who took a deep dive into machine learning. Expect some twists, turns, and a few ‘Wait, what?’ moments along the way.”

The Journey Map

Day 1: Data Cleaning → Day 2: EDA + UCL Magic → Day 3: Final Insights

Day 1: Operation Clean-Up

Before:

Before Cleaning of FIFA data

After:

Starting with the basics — data cleaning. Already being familiar with pandas and numpy made this part surprisingly smooth.

Organized 19,000+ players (yes, I counted) into clear categories:

  • Player Name
  • Club
  • Skills
  • Body Stats
  • Player Finance
  • Player International

Day 2: EDA day → The Question Marathon (feat. An Epic UCL Night)

Barca-4 / Bayern- 1

It’s 1 AM, and here I am, deep into analyzing emerging players from 2022 (yes, really). Suddenly, Barcelona vs. Bayern Munich kicks off in the Champions League →a match I’ve been waiting for long to watch. What a game it turned out to be! And Pedri? Absolutely astonishing. Easily one of the best matches I’ve seen in a long time

captionless image

Post-match, fueled by football excitement, I tackled these burning questions:

1.Built the “Moneyball Dream Team” — best players over 75+ rating with lowest possible price (there are some serious bargains out there!)

captionless image

2.Found out which countries are producing the next generation of stars (Estonia, Burkina Faso, and Egypt, surprisingly!)

3. Discovered the most and less balanced players above 75 overall (mostly midfielders, as it turns out)

4. Investigated how club loyalty affects player ratings (spoiler: it matters more in top leagues PL, La Liga, Bundesliga)

days in club vs overall rating

5. Analyzed Premier League clubs’ (top 5 and bottom 5) work rates vs. finances (fascinating correlation there)

Manchester City has biggest work rate along with biggest wage and value. Chelsea, despite being one of the best teams, have an inappropriately low work rate, unlike Leeds United

6. Deep-dove into Brazilian clubs having the highest percentage of home country players

captionless image

7. Which leagues gathers most market value in players?

captionless image

8. Compared Ramos vs. Van Dijk (sorry VVD fans, but Ramos’s sliding tackle game is unmatched)

captionless image

Ramos is better at passing and dribbling to help midfielder in attacking along with defending with his favourite sliding tackle haha.Dijk is better defending as his defending, pace and physical is better than Ramos.who won? i can’t tell, they’re both goat for me

9. Left foot Vs Right foot

right players amount > left players amount

Day 3: Getting Technical — Final Insights

The ML Journey:

My_Brain = {
    "Morning": "What is sklearn?",
    "Afternoon": "Maybe I'm getting it",
    "Evening": "Let's leave it for 100daysOfML"
}

First time using sklearn — shoutout to the internet for the 3-hour crash course!

Didn’t quite finish it, but no worries, I’ll be back for more soon ❤.

Hidden Patterns:

Applied Kmeans Clustering to find the Basic patterns ( its github copilot who’s helping me for this one😂):

captionless image

the higher the shooting and dribbling skills, the better the player

Through PCA and clustering:

  • Mbappé emerged as a statistical outlier
  • Most players share common skill patterns
captionless image

And the finally The GOAT

captionless image
  • Messi tops the charts by rating again, king of football
  • Lewandowski close second with scoring 50 goals a season, goal machine
  • Ronaldo is still holding at 3rd despite his age, truly legend

But let’s be real, no spreadsheet is gonna settle the GOAT debate.Whether you’re chanting for Messi or Ronaldo, these stats won’t change the fact that every fan has their own GOAT. so let’s just agree on one thing: we’re all lucky to watch these legends do their thing. 🙌

Here is my goat for you — — ANTONY the legend😅

captionless image

[FIFA 22 complete player dataset

19k+ players, 100+ attributes extracted from the latest edition of FIFA

www.kaggle.com](https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset)

  • Analysis GitHub Repo: (you can use the notebook from github)

Here is a final documentation on My EDA Project:

captionless image

So yes , This much for today. Thank you !!