This project is a Machine Learning-based Spam Detector that classifies messages as spam or not spam using a trained Naive Bayes model. The model is built using scikit-learn and is packaged inside a Docker container for easy deployment and usage.
- Trains a Naive Bayes model using a labeled dataset of SMS messages.
- Uses TF-IDF vectorization to transform text into numerical features.
- Saves the trained model and vectorizer for future predictions.
- Provides a Dockerized environment for easy execution.
- Allows spam detection for new messages via a command-line interface.
The model is trained on a publicly available SMS spam dataset: https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv
📂 email_spam_detection ├── Dockerfile ├── email_spam_detector.py # Script to train and save the model ├── predict.py # Script to predict if a message is spam ├── requirements.txt # Python dependencies ├── spam_detector_model.pkl # Trained model (saved) ├── spam_detector_vectorizer.pkl # TF-IDF vectorizer (saved) └── README.md
To get started, you'll first need to clone the repository. Then, you can choose to run the project using Docker or locally with Python.
1. Clone the repository:
git clone [email protected]:sempedia/email_spam_detector.git
### Running with Docker
Ensure you have **Docker** installed on your machine. Then, follow these steps:
1. **Build the Docker Image:**
```sh
docker build -t spam-detection .
- Run the Model Training Script:
docker run --rm spam-detection
- Run a Prediction on a Sample Message:
docker run --rm spam-detection python predict.py "Win a free iPhone now!"
The email_spam_detector.py
script downloads the dataset, trains the model, and saves it for future use.
To manually run it (without Docker):
python email_spam_detector.py
The predict.py
script loads the trained model and classifies a given message as Spam or Not Spam.
Example usage:
python predict.py "Congratulations! You've won a free vacation!"
This project requires:
-
Python 3.9+
-
pandas
-
scikit-learn
-
joblib
-
scipy
-
numpy
-
The requirements.txt file is included in the project structure for easy installation. If running locally, install dependencies using:
pip install -r requirements.txt
This project is open-source and available for use under the MIT License.
Alina Bazavan (sempedia GitHub profile: https://github.com/sempedia)
Happy coding! 🚀
Contributions are welcome! If you have suggestions for improvements or find bugs, please open an issue or submit a pull request.