PyTorch Transformer Implementation: Attention Is All You Need

This repository provides a comprehensive PyTorch implementation of the groundbreaking Transformer architecture introduced in the paper "Attention Is All You Need". The implementation includes the full encoder-decoder structure with multi-head attention mechanisms, positional encoding, and residual connections.

🚀 Features

Complete Transformer model implementation
Training pipeline with configurable hyperparameters
Customizable dataset handling
PyTorch-based implementation

⚙️ Requirements

Package	Minimum Version
Python	3.96
PyTorch	1.12.0
TorchText	0.13.0
NumPy	1.26.4
UV	Latest

📂 Project Structure

.
├── config/           # Configuration files (hyperparameters, paths)
│   └── config.yaml   # The yaml file for configuration
├── datasets/         # Dataset handling
│   ├── dataset       # The txt file for dataset
│   ├── dataset.py    # Dataloader class for training or interence
│   └── util.py       # Read data from txt file
├── models/           # Transformer implementation
│   ├── embedding/    # Tolen embedding and positional embedding
│   ├── layers/       # Encoder and Decoder
│   ├── model/        # Model architecture
│   └── utils/        # Utils for model, such as masking
├── train.py          # Main training script
├── infer.py          # Main interence script
├── requirements.txt  # This dependencies
└── README.md         # This documentation

🚀 Quick Start

Clone the repository

git clone [email protected]:fanfan-yu/transformer.git
cd transformer

Install UV package manager

pip install uv

Download the dependencies

# Initialize environment and install dependencies
uv init
uv sync

Run the Project

# Start training the Transformer model
uv run train.py

🔧 Configuration

The config/config.yaml file contains all configurable parameters:

path:
  sentences_path: ./datasets/dataset/sentences.txt
  src_vocab_path: ./datasets/dataset/src_vocab.txt
  tgt_vocab_path: ./datasets/dataset/tgt_vocab.txt
# model parameters
model:
  d_model: 512
  n_head: 8
  d_key: 64
  d_value: 64
  d_feedforward: 2048
  max_len: 5000
  num_encoder_layers: 6
  num_decoder_layers: 6
# train parameters
train:
  batch_size: 2
  epoch: 20
  learning_rate: 0.001
  dropout: 0.1

📊 Todo List

Implement inference
Add support for external datasets (WMT, WikiText)
Create Jupyter Notebook tutorials for beginners
Optimize training for GPU environments

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create your feature branch (git checkout -b feat/your-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/your-feature)
Open a pull request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTorch Transformer Implementation: Attention Is All You Need

🚀 Features

⚙️ Requirements

📂 Project Structure

🚀 Quick Start

Clone the repository

Install UV package manager

Download the dependencies

Run the Project

🔧 Configuration

📊 Todo List

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
datasets		datasets
models		models
LICENSE		LICENSE
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt
train.py		train.py

License

fanfan-yu/transformer

Folders and files

Latest commit

History

Repository files navigation

PyTorch Transformer Implementation: Attention Is All You Need

🚀 Features

⚙️ Requirements

📂 Project Structure

🚀 Quick Start

Clone the repository

Install UV package manager

Download the dependencies

Run the Project

🔧 Configuration

📊 Todo List

🤝 Contributing

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages