A fast cross-platform CPU-first video/audio English-only transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces. A pip
installable offline CLI tool with CUDA support is provided. By default, Voice Activity Detection (VAD) preprocessing is always enabled.
- Python 3.9
- 4 GB RAM
Simply cURL the endpoint like in the following. Currently, the only available caption format are srt
, vtt
and txt
.
curl "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe?caption_format=$CAPTION_FORMAT" \
-F "file=@$AUDIO_FILE_PATH"
You can also redirect the output to a file.
curl "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe?caption_format=$CAPTION_FORMAT" \
-F "file=@$AUDIO_FILE_PATH" | jq -r ".result" > result.srt
You can stream the captions in real-time with the following.
curl -N "https://winstxnhdw-CapGen.hf.space/api/v2/transcribe/stream?caption_format=$CAPTION_FORMAT" \
-F "file=@$AUDIO_FILE_PATH"
CapGen
is available as a CLI tool with CUDA support. You can install it with pip
.
pip install "capgen-cli @ git+https://github.com/winstxnhdw/CapGen#subdirectory=cli"
You may also install capgen
with the necessary CUDA binaries.
pip install "capgen-cli[cuda] @ git+https://github.com/winstxnhdw/CapGen#subdirectory=cli"
Now, you can run the CLI tool with the following command.
capgen -c srt -o ./result.srt --cuda < ~/Downloads/audio.mp3
usage: capgen [-h] [-g] [-t] [-w] -c -o [file]
transcribe a compatible audio/video file into a chosen caption file format
positional arguments:
file the file path to a compatible audio/video
options:
-h, --help show this help message and exit
-g, --cuda whether to use CUDA for inference
-c, --caption the chosen caption file format
-o, --output the output file path
cpu:
-t, --threads the number of CPU threads
-w, --workers the number of CPU workers
You can install the required dependencies for your editor with the following.
uv sync --all-packages
You can spin the server up locally with the following. You can access the Swagger UI at localhost:7860/api/docs.
docker build -f Dockerfile.build -t capgen .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 capgen