Complete football/soccer datalake with 93000+ players from Transfermarkt. Includes player profiles, performance statistics, market values, transfer histories, injury records, national team data, and teammate relationships.
- π― Total Players: 92,671 professional football players
- β½ Total Teams: 2,175 clubs worldwide
- π Geographic Scope: Global coverage of all major leagues
- π Data Categories: 10 comprehensive data categories
Check out a sample of the dataset to get started.
datalake/transfermarkt/raw/
βββ player_profiles/
βββ player_performances/
βββ player_market_values/
βββ player_transfer_histories/
βββ player_injury_histories/
βββ player_national_team_performances/
βββ player_teammates_played_with/
datalake/transfermarkt/raw/
βββ teams_details/
βββ teams_competitions_seasons/
βββ teams_children/
- 92,671 Player Profiles
- 1,878,719 Player Performances
- 901,457 Player Market Values
- 1,101,440 Player Transfer Histories
- 143,195 Player Injury Histories
- 92,701 Player National Team Performances
- 1,257,342 Player Teammates Played With
- 2,175 Teams Details
- 196,378 Teams Competitions Seasons
- 7,695 Teams Children
- Players Total Count: 5,467,525
- Teams Total Count: 206,248
- All Total Count: 5,673,773
erDiagram
PLAYER_PROFILES {
varchar player_id PK
varchar player_slug
varchar player_name
varchar player_image_url
varchar date_of_birth_url
date date_of_birth
varchar place_of_birth_country
varchar place_of_birth
varchar height
varchar citizenship_country
varchar citizenship
varchar position
varchar foot
varchar player_agent_url
varchar player_agent
varchar current_club_id FK
varchar current_club_url
date joined
date contract_expires
varchar social_media_url
varchar social_media
varchar player_main_position
varchar player_sub_position
}
PLAYER_MARKET_VALUES {
varchar player_id FK
bigint date_unix PK
int value
}
PLAYER_TRANSFER_HISTORIES {
varchar transfer_id PK
varchar player_id FK
varchar season
date date
varchar date_unformatted
varchar from_team_id FK
varchar from_team_url
varchar from_team_name
varchar to_team_id FK
varchar to_team_url
varchar to_team_name
int value_at_transfer
varchar transfer_fee
}
PLAYER_PERFORMANCES {
varchar player_id FK
varchar season
varchar competition_id FK
varchar competition_url
varchar competition_name
varchar team_id FK
varchar team_url
varchar team_name
int nb_in_group
int nb_on_pitch
int goals
int own_goals
int assists
int subed_in
int subed_out
int yellow_cards
int second_yellow_cards
int direct_red_cards
int penalty_goals
int minutes_played
int goals_conceded
int clean_sheets
}
PLAYER_TEAMMATES_PLAYED_WITH {
varchar player_id FK
varchar teammate_id FK
varchar player_with_url
varchar player_with_name
float ppg_played_with
int joint_goal_participation
int minutes_played_with
}
PLAYER_INJURY_HISTORIES {
varchar player_id FK
varchar season
varchar injury_reason
date from_date PK
date end_date
int days_missed
int games_missed
}
PLAYER_NATIONAL_TEAM_PERFORMANCES {
varchar player_id FK
varchar team_id FK
varchar team_url
varchar team_name
date first_game_date PK
int matches
int goals
}
TEAMS_DETAILS {
varchar club_id PK
varchar club_slug
varchar club_name
varchar logo_url
varchar country_name
varchar season_id
varchar competition_id FK
varchar competition_slug
varchar competition_name
varchar club_division
varchar source_url
}
TEAMS_CHILDREN {
varchar parent_team_id FK
varchar parent_team_name
varchar child_team_id FK
varchar child_team_name
}
TEAMS_COMPETITIONS_SEASONS {
varchar club_division
varchar club_id
varchar competition_id
varchar competition_name
int season_draws
int season_goal_difference
int season_goals_against
int season_goals_for
varchar season_id
boolean season_is_two_point_system
varchar season_league_competition_id
varchar season_league_league_name
varchar season_league_league_slug
varchar season_league_level_level_name
int season_league_level_level_number
varchar season_league_season_id
int season_losses
varchar season_manager
varchar season_manager_manager_id
varchar season_manager_manager_name
varchar season_manager_manager_slug
int season_points
int season_points_against
int season_points_for
int season_rank
varchar season_season
int season_total_matches
int season_wins
varchar team_name
}
COMPETITIONS {
varchar competition_id PK
varchar competition_slug
varchar competition_name
}
%% RELATIONSHIPS
PLAYER_PROFILES ||--o{ PLAYER_MARKET_VALUES : "has values"
PLAYER_PROFILES ||--o{ PLAYER_TRANSFER_HISTORIES : "has transfers"
PLAYER_PROFILES ||--o{ PLAYER_PERFORMANCES : "has performances"
PLAYER_PROFILES ||--o{ PLAYER_TEAMMATES_PLAYED_WITH : "played with"
PLAYER_PROFILES ||--o{ PLAYER_INJURY_HISTORIES : "has injuries"
PLAYER_PROFILES ||--o{ PLAYER_NATIONAL_TEAM_PERFORMANCES : "national team"
TEAMS_DETAILS ||--o{ TEAMS_CHILDREN : "parent/child"
TEAMS_DETAILS ||--o{ TEAMS_COMPETITIONS_SEASONS : "plays in"
COMPETITIONS ||--o{ TEAMS_DETAILS : "competition includes teams"
PLAYER_TRANSFER_HISTORIES }o--|| TEAMS_DETAILS : "from/to team"
PLAYER_PERFORMANCES }o--|| TEAMS_DETAILS : "performance for team"
PLAYER_PERFORMANCES }o--|| COMPETITIONS : "performance in comp"
PLAYER_NATIONAL_TEAM_PERFORMANCES }o--|| TEAMS_DETAILS : "national team"
- β Deduplication: Content hashing prevents duplicate data
- β Incremental Updates: Only changed data is reprocessed
- β Error Tracking: Failed URLs logged for monitoring
- β Unicode Support: Proper handling of international characters
- β Timestamp Tracking: All records include update timestamps
Most datasets give you a filtered, pre-processed view.
Working with raw football data lets you explore everythingβfrom cleaning and organizing to deep analysisβgiving you the opportunity to learn by doing.
- π― Explore Freely β Investigate the data your way and discover patterns on your own
- π¬ Develop Analytical Skills β Create your own metrics, KPIs, and ways of interpreting the game
- π€ Experiment with Machine Learning β Train models on raw features to understand player performance, tactics, and trends
- π Spot Hidden Insights β Learn to uncover trends that pre-processed datasets might hide
| Raw Data Aspect | How You Can Learn |
|---|---|
| ποΈ Build Your Own Pipeline | Gain hands-on experience cleaning, structuring, and preparing large datasets |
| π Deep Data Exploration | Practice exploratory data analysis (EDA), spot anomalies, and discover patterns |
| β‘ Efficient Data Handling | Learn to query, filter, and transform large datasets effectively |
| π¨ Visual Storytelling | Create your own charts and visualizations to communicate insights clearly |
| π Combine Sources | Merge data from matches, players, and events to see the bigger picture and draw richer conclusions |
| π Learn Through Iteration | Test different approaches, refine your methods, and see the impact of your analysis in real time |
Help maintain and expand this valuable football dataset:
Your sponsorship helps with:
- π Regular Data Updates: Keep the dataset current
- π Expanded Coverage: Add more leagues and competitions
- π§ Infrastructure Costs: Server and storage maintenance
- π Data Quality: Enhanced validation and processing
Iβm always excited to collaborate on innovative football data projects. If youβve got an idea, letβs make it happen together!
- GitHub: @salimt
- LinkedIn: salimt
- Issues: Feel free to use GitHub Issues if youβve got dataset-specific questions.
If you find this project useful, donβt forget to drop a star β on GitHubβit really helps others discover it too!
Contributions to the Nodeball Football Datalake are most welcome! If you want to contribute new fields, data improvements, or processing enhancements to this dataset, the instructions are quite simple:
- Fork the repo
- Set up your local environment
- Analyze the datalake structure in
datalake/directory - Start modifying data processing or creating new data extraction scripts
- If it's all looking good, create a pull request with your changes π
- π Data Quality: Report inconsistencies or missing data
- π§ Processing Scripts: Improve data extraction and validation
- π New Data Categories: Add new types of football data
- π§Ή Data Cleaning: Help with validation and normalization
- π Documentation: Improve dataset documentation
football-data soccer-dataset transfermarkt-data player-statistics football-analytics soccer-analytics sports-data football-research player-performance transfer-market football-database soccer-database sports-dataset football-datalake soccer-datalake
Built with β½ by salimt
"Complete football datalake - no player left behind."