This repository provides a comprehensive PyTorch implementation of the groundbreaking Transformer architecture introduced in the paper "Attention Is All You Need". The implementation includes the full encoder-decoder structure with multi-head attention mechanisms, positional encoding, and residual connections.
- Complete Transformer model implementation
- Training pipeline with configurable hyperparameters
- Customizable dataset handling
- PyTorch-based implementation
Package | Minimum Version |
---|---|
Python | 3.96 |
PyTorch | 1.12.0 |
TorchText | 0.13.0 |
NumPy | 1.26.4 |
UV | Latest |
.
├── config/ # Configuration files (hyperparameters, paths)
│ └── config.yaml # The yaml file for configuration
├── datasets/ # Dataset handling
│ ├── dataset # The txt file for dataset
│ ├── dataset.py # Dataloader class for training or interence
│ └── util.py # Read data from txt file
├── models/ # Transformer implementation
│ ├── embedding/ # Tolen embedding and positional embedding
│ ├── layers/ # Encoder and Decoder
│ ├── model/ # Model architecture
│ └── utils/ # Utils for model, such as masking
├── train.py # Main training script
├── infer.py # Main interence script
├── requirements.txt # This dependencies
└── README.md # This documentation
git clone [email protected]:fanfan-yu/transformer.git
cd transformer
pip install uv
# Initialize environment and install dependencies
uv init
uv sync
# Start training the Transformer model
uv run train.py
The config/config.yaml
file contains all configurable parameters:
path:
sentences_path: ./datasets/dataset/sentences.txt
src_vocab_path: ./datasets/dataset/src_vocab.txt
tgt_vocab_path: ./datasets/dataset/tgt_vocab.txt
# model parameters
model:
d_model: 512
n_head: 8
d_key: 64
d_value: 64
d_feedforward: 2048
max_len: 5000
num_encoder_layers: 6
num_decoder_layers: 6
# train parameters
train:
batch_size: 2
epoch: 20
learning_rate: 0.001
dropout: 0.1
- Implement inference
- Add support for external datasets (WMT, WikiText)
- Create Jupyter Notebook tutorials for beginners
- Optimize training for GPU environments
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create your feature branch (git checkout -b feat/your-feature)
- Commit your changes (git commit -am 'Add some feature')
- Push to the branch (git push origin feature/your-feature)
- Open a pull request
This project is licensed under the MIT License - see the LICENSE file for details.