European Soccer Analytics Platform
Full-Stack Analytics with MongoDB, Flask, and Machine Learning
MongoDB · Flask · Scikit-learn · Gradient Boosting · Docker
DS-5760 NoSQL for Modern Data Science | Vanderbilt University
Built a comprehensive soccer analytics platform that transforms a relational SQLite database into a MongoDB document store for analyzing 25,979 matches from 11 European leagues. Includes an interactive Flask web interface with 7 analytical queries and a Gradient Boosting model for match outcome prediction.
Platform Scale
25,979
Matches analyzed
11
European leagues
11,060
Players tracked
50%
ML prediction accuracy
Key Features
7 Interactive Analytical Queries
- Team Performance by Season (league standings)
- Home vs Away Performance Analysis
- Head-to-Head Historical Records
- Player Appearance Frequency
- Team Form Analysis (recent momentum)
- High-Scoring vs Low-Scoring Teams
- Team Attributes Correlation with Success
Machine Learning Match Prediction
- Gradient Boosting model (best of 3 tested algorithms)
- 50% accuracy (typical for soccer prediction)
- 84% recall for home wins
- 15 features including team ratings and recent form
ML Key Finding: Recent Form Matters Most
Analysis of feature importance revealed that recent form (22%) is the most important predictor, outweighing static FIFA ratings. This validates the importance of temporal data in sports prediction.
22%
Form difference (most important)
7.9%
Defensive rating difference
MongoDB Design Decisions
- Denormalized documents: Team and player info embedded in match documents for faster reads
- Temporal attributes: Player/team ratings stored as arrays within documents
- Strategic indexing: Indexes on date, league, season, team names for optimized queries
- Flexible schema: Naturally handles varying match events (goals, cards, substitutions)
Dataset: Kaggle European Soccer Database
Comprehensive dataset spanning 2008-2016 across 11 European leagues:
Plus: Portugal, Poland, Scotland, Belgium, Switzerland
Flask Web Application
17
Routes
7
Query Pages
7
API Endpoints
3K+
Lines of Code
Technologies Used
Learning Outcomes
- NoSQL database design with denormalization strategies
- Complex MongoDB aggregation pipelines
- Feature engineering from database queries
- RESTful API design with Flask
- Full-stack development with AJAX dynamic loading