Finetune Voxtral for ASR with Transformers 🤗

This repository fine-tunes the Voxtral speech model on conversational speech datasets using the Hugging Face transformers and datasets libraries.

Installation

Step 1: Clone the repository

git clone https://github.com/Deep-unlearning/Finetune-Voxtral-ASR.git
cd Finetune-Voxtral-ASR

Step 2: Set up environment

Choose your preferred package manager:

📦 Using UV (recommended)

Install uv

uv venv .venv --python 3.10 && source .venv/bin/activate
uv pip install -r requirements.txt

🐍 Using pip

python -m venv .venv --python 3.10 && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Dataset Preparation

Perfect — here’s a drop-in replacement for your README’s “Dataset Preparation” that matches your script (uses hf-audio/esb-datasets-test-only-sorted with the voxpopuli config, 16 kHz casting, and a small train/eval slice), and explains the Voxtral/LLaMA-style prompt+label masking your collator implements.

Dataset Preparation

For ASR fine-tuning, inputs look like:

Inputs: [AUDIO] … [AUDIO] <transcribe> <reference transcription>
Labels: same sequence, but the prefix [AUDIO] … [AUDIO] <transcribe> is masked with -100 so loss is computed only on the transcription tokens.

The VoxtralDataCollator already builds this sequence (prompt expansion via the processor and label masking). The dataset only needs two fields:

{
  "audio": {"array": <float32 numpy array>, "sampling_rate": 16000, ...},
  "text":  "<reference transcription>"
}

If you want to swap to a different dataset, ensure after loading you still have:

an audio column (cast to Audio(sampling_rate=16000)), and
a text column (the reference transcription).

If your dataset uses different column names, map them to audio and text before returning.

Training

Run the training script:

uv run train.py

Logs and checkpoints will be saved under the outputs/ directory by default.

Training with LoRA

You can also run the training script with LoRA:

uv run train_lora.py

Happy fine-tuning Voxtral! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
train_lora.py		train_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Finetune Voxtral for ASR with Transformers 🤗

Installation

Step 1: Clone the repository

Step 2: Set up environment

Dataset Preparation

Dataset Preparation

Training

Training with LoRA

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Deep-unlearning/Finetune-Voxtral-ASR

Folders and files

Latest commit

History

Repository files navigation

Finetune Voxtral for ASR with Transformers 🤗

Installation

Step 1: Clone the repository

Step 2: Set up environment

Dataset Preparation

Dataset Preparation

Training

Training with LoRA

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages