Third-party providers hosting
A tiny Python no-string package for performing translation of a massive CSV/JSONL files that
natively provides support of pre-annotated fixed-spans that are invariant for translator.
The out-of-the box features of the bulk-translate are:
- ✅ Support of the
spansfor annotation / optional translation. - ✅ Native Implementation of two translation modes:
fast-mode: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.accurate: performs individual translation of each text part.
- ✅ No strings: you're free to adopt any LM / LLM backend.
- Support
googletransby default.
- Support
From PyPI:
pip install bulk-translateor latest version from here:
pip install git+https://github.com/nicolay-r/bulk-translateNOTE: Spans supports only in JSON-lines format.
NOTE: Requires
source_iterpackage installation.
For the following test.tsv example data with annotated entities enclosed in square brackets:
python -m bulk_translate.translate \
--src "test/data/test.tsv" \
--schema '{"translated":"{text}"}' \
--adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
--output "test-translated.jsonl" \
--batch-size 10 \
%%m \
--src "auto" \
--dest "ru"The pipeline construction components were taken from AREkit [github]


