This repository explores knowledge distillation in Vision Transformers (ViTs) with a focus on attention maps.
This technique helps the student model mimic the attention maps of the teacher model, improving its understanding of image details.
Attention maps are generated using attention rollout, providing insights into where the model focuses its attention during inference.
An appropriate loss function is created for the same.

Deeplake/ Activeloop login is required to access the imagenet dataset.