Skip to content

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

Notifications You must be signed in to change notification settings

winstxnhdw/CapGen

Repository files navigation

CapGen

uv python main.yml deploy.yml cli.yml formatter.yml

Open in Spaces Open a Pull Request

A fast cross-platform CPU-first video/audio English-only transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces. A pip installable offline CLI tool with CUDA support is provided. By default, Voice Activity Detection (VAD) preprocessing is always enabled.

Requirements

  • Python 3.9
  • 4 GB RAM

Usage (API)

Simply cURL the endpoint like in the following. Currently, the only available caption format are srt, vtt and txt.

curl "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

You can also redirect the output to a file.

curl "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH" | jq -r ".result" > result.srt

You can stream the captions in real-time with the following.

curl -N "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe/stream?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

Usage (CLI)

CapGen is available as a CLI tool with CUDA support. You can install it with pip.

pip install "capgen-cli @ git+https://github.com/winstxnhdw/CapGen#subdirectory=cli"

You may also install capgen with the necessary CUDA binaries.

pip install "capgen-cli[cuda] @ git+https://github.com/winstxnhdw/CapGen#subdirectory=cli"

Now, you can run the CLI tool with the following command.

capgen -c srt -o ./result.srt --cuda < ~/Downloads/audio.mp3
usage: capgen [-h] [-g] [-t] [-w] -c  -o  [file]

transcribe a compatible audio/video file into a chosen caption file format

positional arguments:
  file            the file path to a compatible audio/video

options:
  -h, --help      show this help message and exit
  -g, --cuda      whether to use CUDA for inference
  -c, --caption   the chosen caption file format
  -o, --output    the output file path

cpu:
  -t, --threads   the number of CPU threads
  -w, --workers   the number of CPU workers

Development

You can install the required dependencies for your editor with the following.

uv sync --all-packages

You can spin the server up locally with the following. You can access the Swagger UI at localhost:7860/api/docs.

docker build -f Dockerfile.build -t capgen .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 capgen

About

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •