European Soccer Analytics Platform

Full-Stack Analytics with MongoDB, Flask, and Machine Learning

MongoDB · Flask · Scikit-learn · Gradient Boosting · Docker

DS-5760 NoSQL for Modern Data Science | Vanderbilt University

Built a comprehensive soccer analytics platform that transforms a relational SQLite database into a MongoDB document store for analyzing 25,979 matches from 11 European leagues. Includes an interactive Flask web interface with 7 analytical queries and a Gradient Boosting model for match outcome prediction.

Platform Scale

25,979

Matches analyzed

11

European leagues

11,060

Players tracked

50%

ML prediction accuracy

Key Features

7 Interactive Analytical Queries

  • Team Performance by Season (league standings)
  • Home vs Away Performance Analysis
  • Head-to-Head Historical Records
  • Player Appearance Frequency
  • Team Form Analysis (recent momentum)
  • High-Scoring vs Low-Scoring Teams
  • Team Attributes Correlation with Success

Machine Learning Match Prediction

  • Gradient Boosting model (best of 3 tested algorithms)
  • 50% accuracy (typical for soccer prediction)
  • 84% recall for home wins
  • 15 features including team ratings and recent form

ML Key Finding: Recent Form Matters Most

Analysis of feature importance revealed that recent form (22%) is the most important predictor, outweighing static FIFA ratings. This validates the importance of temporal data in sports prediction.

22%

Form difference (most important)

7.9%

Defensive rating difference

MongoDB Design Decisions

  • Denormalized documents: Team and player info embedded in match documents for faster reads
  • Temporal attributes: Player/team ratings stored as arrays within documents
  • Strategic indexing: Indexes on date, league, season, team names for optimized queries
  • Flexible schema: Naturally handles varying match events (goals, cards, substitutions)

Dataset: Kaggle European Soccer Database

Comprehensive dataset spanning 2008-2016 across 11 European leagues:

England Premier LeagueSpain La LigaGermany BundesligaItaly Serie AFrance Ligue 1Netherlands Eredivisie

Plus: Portugal, Poland, Scotland, Belgium, Switzerland

Flask Web Application

17

Routes

7

Query Pages

7

API Endpoints

3K+

Lines of Code

Technologies Used

MongoDBPyMongoFlaskPythonScikit-learnGradient BoostingPandasDockerJinja2REST APIs

Learning Outcomes

  • NoSQL database design with denormalization strategies
  • Complex MongoDB aggregation pipelines
  • Feature engineering from database queries
  • RESTful API design with Flask
  • Full-stack development with AJAX dynamic loading