-
Notifications
You must be signed in to change notification settings - Fork 974
Decompiler outputs
This page describes various RetDec decompilation outputs.
A default decompilation (without any special options listed below) of an input file input.exe produces the following output files:
-
input.exe.dsm: Disassembly output in our custom format. Instruction mnemonics are in the default Capstone format. -
input.exe.bc: The final product of the Core decompilation part in the LLVM bitcode format. -
input.exe.ll: Human-readable disassembly of LLVM bitcode in the LLVM IR format. -
input.exe.config.json: Metadata produced by the decompilation process. -
input.exe.c: The decompiled C code. This is the main output.
As you can see, the output file names are generated simply by adding proper suffixes to the input file name: <input_file>.{dsm, bc, ll, config.json, c}.
The following options of retdec-decompiler application control the output generation process:
-
-o FILE, --output FILEIf specified, the main decompilation output is stored toFILEinstead of<input_file>.c. Furthermore,FILE(without a potential suffix) is used as a base name to generate other output file names:<FILE_w/o_suffix>.{dsm, bc, ll, config.json}. -
-f OUTPUT_FORMAT, --output-format OUTPUT_FORMATThe defaultplainoption generates the main decompilation output directly as a high-level-language source code into an associated text file (e.g. C source code into a*.cfile). Thejsonandjson-humanoptions generate the output source code as a stream of lexer tokens, plus additional information. See the section below for a detailed format description. The suffix of the main decompilation output file is changed to.json. -
--cleanupRemoves temporary files created during decompilation. Only the main decompilation output file and the disassembly file are preserved.
Run retdec-decompiler --help for more info an all the available options.
Parsing high-level-language source code is not trivial. However, 3rd-party reversing tools might need to do just that in order to make use of output from RetDec. Furthermore, additional meta-information may be required to enhance user experience or automated analysis - information that is hard to convey in a traditional high-level-language source code. Usage examples:
- Syntax highlighting in RetDec IDA plugin.
- Relations between decompiled output lines/elements and the original disassembly instructions in RetDec IDA plugin and RetDec Radare2 plugin.
In order to make these applications possible, RetDec offers an option (see the previous section) to generate its output as a sequence of annotated lexer tokens into a JSON format. Two JSON flavors can be generated:
- Human-readable JSON containing proper indentation (option
json-human). - Machine-readable JSON without any indentation (option
json).
In order to parse both flavors with a single parser implementation, they both use the same keys and values.
The current JSON schema is the following:
{
"language" : "<language_ID>",
"tokens" :
[
{
"addr" : "<address_format>",
"kind" : "<kind_values>",
"val" : "<value>"
},
// ...
]
}-
All values are of the string data type.
-
languagekey identifies the high-level language being tokenized. Possible<language_ID>values:Value Description C C language -
Source code is serialized in an array of token objects in a
tokensarray. -
Token object contains the following entries:
- Assembly address associated with the token with key
addrand value in the prefixed hexadecimal format (e.g.0x8048577). - Value
valwhich holds the actual token string as would appear in the source code. - Token type with key
kindand the following possible values:
Value Description Example(s) nlNew line. "\n"wsAny consecutive sequence of white spaces. " "puncA single punctuation character. "("")""{""}""[""]"";"opOperator. "==""-""+""*""->""."i_varGlobal/Local variable identifier. "global_var"i_memStructure/Class member identifier. "entry_1"i_labLabel identifier. "label_0x1234"i_fncFunction identifier. "ackermann"i_argFunction argument identifier. "argv"keywHigh-level-language keyword. "while"typeData type. "uint64_t"preprocPreprocessor directive. "#include"incString used in an #includepreprocessor directive. Including<>."<stdlib.h>"l_boolBoolean literal. "true""false"l_intInteger literal. Including potential prefixes and suffixes. "123""0x213A""1234567890123456789LL"l_fpFloating point literal. Including potential prefixes and suffixes. "3.14""123.456e-67"l_strString literal. Including properly escaped ""."\"ackerman( %d , %d ) = %d\\n\""l_symSymbolic literal. "UNDEFINED"l_ptrPointer literal. "NULL"cmntComment. Including delimiter like //or/* */."// Detected compiler/packer: gcc (4.7.2)" - Assembly address associated with the token with key
-
Token
kindandvalentries must always be used together, i.e. one is never used without the other. -
Concatenating all the
valentries produces exactly the same string as would be generated using theplainformat option. -
Address entry does not have to be present in every token object. If it is missing, the token is associated with the last address entry.
-
Address entry can be used without
kindandvalentries. In such a case, it effectively sets associated address for the upcoming tokens.{ "addr" : "0x80498e8" }, // sequence of tokens associated with address 0x80498e8 { "addr" : "0x80498f4" }, // sequence of tokens associated with address 0x80498f4
-
Not all tokens must (or can) be associated with an assembly address. Such tokens are associated with and empty address:
{ "addr" : "0x80498e8" }, // sequence of tokens associated with address 0x80498e8 { "addr" : "" }, // sequence of tokens unassociated with any address
-
Token-to-address association is not intrinsically line-based. For example, the following line:
printf("ackerman( %d , %d ) = %d\n", x, y, result);
can be broken down to several pieces, each associated with a different assembly instruction:
-
printf(- associated withCALLinstruction. -
"ackerman( %d , %d ) = %d\n"- associated withLOADinstruction loading the call argument. -
, x- associated withLOADinstruction loading the call argument. -
, y- associated withLOADinstruction loading the call argument. -
, result- associated withLOADinstruction loading the call argument. -
);- associated withCALLinstruction.
-