CapGen

A fast cross-platform CPU-first video/audio English-only transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces. A pip installable offline CLI tool with CUDA support is provided. By default, Voice Activity Detection (VAD) preprocessing is always enabled.

Requirements

Python 3.9
4 GB RAM

Usage (API)

Simply cURL the endpoint like in the following. Currently, the only available caption format are srt, vtt and txt.

curl "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

You can also redirect the output to a file.

curl "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH" | jq -r ".result" > result.srt

You can stream the captions in real-time with the following.

curl -N "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe/stream?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

Usage (CLI)

CapGen is available as a CLI tool with CUDA support. You can install it with pip.

pip install "capgen-cli @ git+https://github.com/winstxnhdw/CapGen#subdirectory=cli"

You may also install capgen with the necessary CUDA binaries.

pip install "capgen-cli[cuda] @ git+https://github.com/winstxnhdw/CapGen#subdirectory=cli"

Now, you can run the CLI tool with the following command.

capgen -c srt -o ./result.srt --cuda < ~/Downloads/audio.mp3

usage: capgen [-h] [-g] [-t] [-w] -c  -o  [file]

transcribe a compatible audio/video file into a chosen caption file format

positional arguments:
  file            the file path to a compatible audio/video

options:
  -h, --help      show this help message and exit
  -g, --cuda      whether to use CUDA for inference
  -c, --caption   the chosen caption file format
  -o, --output    the output file path

cpu:
  -t, --threads   the number of CPU threads
  -w, --workers   the number of CPU workers

Development

You can install the required dependencies for your editor with the following.

uv sync --all-packages

You can spin the server up locally with the following. You can access the Swagger UI at localhost:7860/api/docs.

docker build -f Dockerfile.build -t capgen .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 capgen

Name		Name	Last commit message	Last commit date
Latest commit History 358 Commits
.github		.github
captions		captions
cli		cli
server		server
transcriber		transcriber
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.build		Dockerfile.build
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CapGen

Requirements

Usage (API)

Usage (CLI)

Development

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

winstxnhdw/CapGen

Folders and files

Latest commit

History

Repository files navigation

CapGen

Requirements

Usage (API)

Usage (CLI)

Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

Packages