Skip to content

gugs881/Predicting-Football-Match-Outcomes-with-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Machine Learning(Soccer)

Summary This multiclass classification project for predicting football match outcomes employed a variety of approaches and models. Below is a detailed summary of the steps, methods, and reasoning behind each decision:

Models Used XGBoost

Model: XGBClassifier

Why it was used: Efficient for multiclass classification, provides overfitting control, and is optimized for large-scale datasets.

CatBoost

Model: CatBoostClassifier

Why it was used: Handles categorical variables natively without requiring manual encoding such as one-hot encoding. Also performs well on imbalanced datasets.

Random Forest

Model: RandomForestClassifier

Why it was used: Robust against overfitting due to the use of multiple decision trees and can capture complex interactions between variables.

Logistic Regression

Model: LogisticRegression

Why it was used: A simple linear model used as a baseline to compare with more complex models.

Support Vector Machine (SVM)

Model: LinearSVC

Why it was used: Efficient classifier for linearly separable problems, with built-in regularization to avoid overfitting.

Decision Tree

Model: DecisionTreeClassifier

Why it was used: An interpretable model that creates data splits based on features, used as a baseline for comparison with more sophisticated models.

Neural Network (MLP - Multilayer Perceptron)

Model: Sequential (Keras)

Why it was used: Designed to capture complex patterns in data through multiple dense layers. Dropout and L1/L2 regularization were applied to prevent overfitting.

Evaluation Techniques Used ROC Curve (Receiver Operating Characteristic)

Why it was used: The ROC curve evaluates classifier performance in terms of sensitivity (true positive rate) vs. specificity (false positive rate).

Method applied: ROC curves were used to visualize multiclass classification performance, with AUC (Area Under the Curve) calculated for each class.

Confusion Matrix

Why it was used: Provides a detailed view of how the model correctly or incorrectly classifies each class. Displays true positives, false positives, true negatives, and false negatives.

Method applied: The confusion_matrix function was used to compute the matrix, and ConfusionMatrixDisplay was applied for visualization, helping to better understand model performance per class.

Auxiliary Techniques Cross-Validation (K-Fold)

Why it was used: K-Fold cross-validation ensures the model is evaluated across different data splits, offering a better sense of its generalization ability.

Method applied: KFold was used to split the dataset into multiple folds and compute average accuracy, aiding in identifying overfitting or underfitting.

Manual Hyperparameter Search

Why it was used: Hyperparameter tuning is essential to improve model performance, adjusting values such as neuron counts, dropout rates, and learning rates.

Method applied: A manual search was conducted to test several parameter combinations, aiming to maximize the model’s accuracy.

Conclusion This project combined machine learning techniques, neural networks, and robust evaluation strategies. Models like XGBoost, CatBoost, and Neural Networks were assessed using ROC Curves and Confusion Matrices, while Decision Trees and Random Forests provided interpretable baselines. Cross-validation and hyperparameter tuning ensured that models were well-fitted and properly validated, resulting in a complete approach to football match outcome prediction.

About

Final Test

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published