Skip to content
Helio edited this page Dec 4, 2016 · 1 revision

About:

BinSourcerer is an assembly to source code matching framework written in Python. Its main purpose is to recreate the functionalities that RE-Google provided, since the Google Code Search API has been discontinued. This plugin can be used for code search on GitHub and the Black Duck Open Hub Code Search, as well as for function tagging. It generates a disassembly feature file that can be used in various binary analyzes. Moreover, the framework functionalities can easily be extended.

Installation:

1.Install or update the IDAPython plugin from http://code.google.com/p/idapython/.

2.Download and install the Beautiful Soup (bs4) Python package from http://www.crummy.com/software/BeautifulSoup/bs4/download/.

3.Install PySide from https://pypi.python.org/pypi/PySide.

(Important note: If you're using IDA Pro 6.6 or greater, you shouldn't install PySide, as it comes integrated with IDA )

4.Install Requests from http://www.python-requests.org/.

5.Unzip the BinSourcerer.zip file and copy the IdaProTextExtractor_Plugin.py file into the IDA Pro plugins folder.

6.Open a disassembly and call the plugin either from the menu or with the Ctrl+Shift+J shortcut.

Functionality:

•The “Function Selection” list on the main window shows the list of available functions in the disassembly.

•Select the functions for which you would like to generate a feature file. You can select multiple functions using Ctrl+click or Shift+click. Checking the “Select all functions” will generate a complete feature file for the current disassembly.

•In the “Feature Extraction” section, set the path and specify the file name for the feature file. The default extension is .xtrak.

•The .xtrak file is used for extracting all the available features of a single assembly function. The features are stored as a list of tuples in the form of: [(‘tag’,’value’)]. The following tags are assigned to function features:

  1. n: function name
  2. c: constant
  3. cx: number of constants
  4. s: string
  5. sx: number of strings
  6. p: prototype
  7. a: argument
  8. r: return type
  9. m: number of instructions
  10. g: number of arguments
  11. b: size of arguments in bytes
  12. l: size of local variables in bytes
  13. f: function flags
  14. i: imported function
  15. ix: number of imports
  16. d: a function that exists in the malware dictionary
  17. dx: number of functions with the MAL tags
  18. t: API tag
  19. tx: number of API tags
  20. o: code references from this function
  21. ox: number of code references from this function
  22. k: functions called from this function
  23. kx: number of function calls from this function

Some of these features are meant for offline matching of function in future versions of the prototype.

•Click on the “API Tagging” button to tag functions according to the groups of API function calls. This functionality works on Windows disassemblies.

•The “API Tagging” feature will rename the function names according to the API function calls in the body of the function. A three character suffix will be added to the function name for each API category. The following categories are available in the current prototype:

  1. LCH (Launcher): This tag shows that the function might try to manipulate the resource section of the file. This is a common behavior in malware, which load additional modules or packed binaries from the resource section.
  2. Different forms of injection attacks: PSJ (Process Injection), DLJ (DLL Injection), DRJ (Direct Injection), PRP (Process Replacement), HKJ (Hook Injection), ACJ (APC injection), AUJ (APC User space), AKJ (APC Kernel space).
  3. WNT (Windows Networking): Common API for networking code.
  4. ADB (Anti Debugging): Common API in detecting an attached debugger.
  5. AVM (Anti-virtual Machine).
  6. REG (Registry) : Registry manipulating functions.
  7. NET (Winsock, WinINet, Cache): Networking functions. Other networking tags: FTP, GPR.
  8. File processing and links: URL, DIR, FIL, SRC (Search).
  9. MTX (Mutex), PIP (Pipe), MOD (Process memory modification), VIR (Virtual), CRT (Critical Section).
  10. ENP (Enumeration).
  11. HAS (Hashing), CRY (Crypto), CER (Certificates).
  12. SRV (Services), OSI (Operating System Information).

•Click on the “Feature Extraction” button to generate the output feature file. The output file will be saved in the specified path.

•At this point, you can close the plugin and launch the “main.py” file. This can be done outside or inside IDA Pro.

•The main form has three sections, namely Data Extractor, Code Repositories, and Analyzer. The framework allows us to define different sources of data, code search engines, and analyzers.

•Click on the Configure button and set the path to Results Folder as well as the Black List File.

•The search configurations (delay, filters, black list, proxy, etc.) can also be set using the Configuration window.

•There are currently two plugins for GitHub and the Black Duck Open Hub Code Search.

•Click on the Start button and open the feature file (Features.xtrk) generated from the plugin.

•The list of functions will be shown in the table. The user then selects the functions that should be be matched with source code.

•The “Select Features” window allows the user to specify which features should be included in the output query. Constants, Strings, and Imported functions are selected by default.

•Click on the Start button to begin the search. The Report Viewer will list the output of online analysis. It will also display the API tags and a short description of the potential context, in which those APIs are used.

•The online search process depends heavily on the network state and in the case of Black Duck, the load on the Open Hub Code Search. You might need to repeat the process in the case the latter is not responsive.

•The generated HTML file shows the function names, query elements, matched projects, and source files. It might also include offline analysis results and API tags. The results of the offline analysis is based on an API search of common malware functions. If API tagging was not performed and the user did not select "Use offline analyzer for malware", the offline results will not be displayed in the report.

Clone this wiki locally