A utility for automated translation of strings for localizable software.
Features:
- Uses the OpenAI API to translate from the source language (default: US English) to the target language(s).
- Supports incremental translation of new or changed strings from files that were translated previously.
The OPENAI_API_KEY environment variable must be set. Alternately, you can add
it to the file .env.
autotranslate [options]
Options:
--config <path>: Optional path to the config file to use. If not specified, defaults toautotranslate.json. See below for details on the config file.--update-hashes: Update hashes in target files without generating new translations. Use this when you have existing translations and want to start using autotranslate without retranslating everything.--verboseor-v: Show details of the configuration and the progress of the translations. Default is to run silently unless there's an error. Verbose mode may also be enabled in the config file.--watch: Run continuously, watching for modifications to the source-language strings file and updating the target-language files as needed. Useful in dev environments to automatically keep translations up to date as original strings are edited.
Strings are defined in strings files. There is a file for the source language that is edited by hand. Strings may be added to and removed from the source-language strings file, or existing strings may be edited.
Each target language also has its own file. The target-language files are updated by autotranslate. They may also be edited by hand if developers want to modify any of the translations.
The config file contains a JSON object with the following structure or an array of JSON objects each with this structure. Some properties are optional as noted below.
{
"batchSize": <number>,
"instructions": "<path>",
"source": {
"file": "<path>",
"format": "<format name>",
"language": "<language name>",
"outputs": [
{
"file": "<path>"
"format": "<format name>",
}
]
},
"targets": [
{
"file": "<path>"
"format": "<format name>",
"instructions": "<path>",
"language": "<language name>",
"outputs": [
{
"file": "<path>"
"format": "<format name>",
}
]
}
],
"verbose": <boolean>
}Explanations of each of the properties, with the property names in JavaScript notation:
batchSize (optional) is the number of strings to translate at a time. Default
is 15. If this is set to 1, batching is disabled. Batching is faster and cheaper
but can cause different translations to be generated.
instructions (optional) is the path of a text file with instructions to include
with translation requests for all languages. For example, it can include
definitions of project-specific terminology, or hints about the level of
formality to use in translations.
source.file (required) is the path of the source strings file.
source.format (optional) is the format of the source strings file, as described
in the Formats section below. Default is csv.
source.language (optional) is the name of the source language. Default is
English.
source.outputs (optional) is a list of files in alternate formats to generate
from the source language. See "Outputs" below.
targets is a list of information about each target language.
targets[].language (required) is the English name of the language, e.g.,
"Spanish".
targets[].file (required) is the path of the target strings file for the
language.
targets[].format (optional) is the format of the target strings file, as
described in the Formats section below. Default is csv.
targets[].instructions (optional) is the path of a text file with instructions
to include with translation requests for this language. For example, it can
contain a list of specific translations to use for project-specific terminology.
targets[].outputs (optional) is a list of files in alternate formats to generate
for the target language. See "Outputs" below.
verbose (optional) enables verbose logging of progress and configuration.
Verbose mode may also be enabled via command-line option.
If the config file contains an array of objects, autotranslate generates translations for the source and target files in each object. Each object can have different instructions and outputs. This can be used to generate translations for projects whose strings are spread across multiple files.
Don't list the same outputs or target files in multiple places; that'll likely lead to inconsistent results.
Translations aren't shared across configurations.
Here's a simple example for a project with separate web and mobile strings files. In this example, the language-independent instructions are different for the two (perhaps the one for the mobile strings says to use shorter words), but the language-specific instructions are the same (perhaps it specifies translations for particular English words).
[
{
"instructions": "web-instructions.txt",
"source": {
"file": "src/web-strings-en.csv"
},
"targets": [
{
"file": "src/web-strings-es.csv",
"format": "csv",
"instructions": "instructions-es.txt",
"language": "Spanish"
}
]
},
{
"instructions": "mobile-instructions.txt",
"source": {
"file": "src/mobile-strings-en.csv"
},
"targets": [
{
"file": "src/mobile-strings-es.csv",
"format": "csv",
"instructions": "instructions-es.txt",
"language": "Spanish"
}
]
}
]Strings files can have different formats, but regardless of format, they can have the following information for each string:
- Key: A unique identifier for the string.
- Text: The text of the string in the file's language.
- Description: Optionally present in the source-langauge strings file. Additional information about the string to help produce better translations.
- Hash: Always present in the target-language strings file. A hash (using the 32-bit xxHash algorithm and encoded in zero-padded lower-case hexadecimal) of the text and description in the source language. This is used to detect when the source-language text or description has been edited and autotranslate needs to generate fresh translations.
The main source and target files are considered the source of truth. Autotranslate will always read and write them.
In addition, autotranslate can write strings for the source and target languages to additional output files.
Each file specification, whether it's the main source/target file or an additional
output, has a file value with the path of the output file and a format value
that controls what kind of file is generated.
List of supported formats, each of which is described in more detail below:
csvjava-propertiesjavascript-const
CSV files have three columns. They always start with a header line. The files use standard CSV formatting, with double quotes omitted if they aren't required.
For the source language CSV, the three columns are:
- Key
- Text
- Description
For the target language CSVs, the three columns are:
- Key
- Text
- Hash
Produces a Java properties file for use as a PropertyResourceBundle. The keys are the string keys and the values are the text, with special characters properly quoted. If a string has a description, it is included in the source language's file as a comment on the line before the key/value pair. For target-language files, the hash is included as a comment on the line before the key/value pair.
Java properties files must use UTF-8 encoding. An encoding header line is included in target-language files; it's ignored in source-language files.
Example (source language):
# encoding: UTF-8
ABC=Some text\: it''s quoted
# Description for key DEF
DEF=Some other text
Produces a JavaScript source file that exports a constant strings that is an
object where the keys are the string keys and the values are the text. If a
string has a description, it is included as a comment on the line before the
key/value pair, but only for the source language. For target-language files,
the hash is included as a comment on the line before the key/value pair.
Example (source language):
export const strings = {
ABC: 'Some text',
// Description for key DEF
DEF: 'Some other text',
};When you run autotranslate, it does the following:
- Reads the source-language strings file.
- Calculates the hash of each string+description in the source-language file.
- For each of the target languages:
- Reads the target language's strings file, if it exists.
- Removes the rows for any keys that don't exist in the source-language file.
- If a key doesn't exist in the target-language file, OR if the hash that was recorded in the target-language file doesn't match the current hash from the source-language file, generates a new translation using the OpenAI API.
- Writes the updated target-language file.
The above is a conceptual description; in reality, some of the operations may be batched or done in parallel.
Translations are generated using the OpenAI Responses API:
https://platform.openai.com/docs/api-reference/responses
The value of the "instructions" field in the API request is constructed from the following pieces:
- A preamble that's built into autotranslate.
- An optional file with project-specific, but not language-specific, instructions.
- An optional file with language-specific instructions.
The instructions are the same for each OpenAI API request. The prompt changes from request to request and includes the text and description, properly delimited as described in the preamble.