Skip to content
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
aefe97e
rules: fix typos
Aug 11, 2021
a1eca58
features: support characteristic(os/*) features
Aug 11, 2021
e797a67
features: define CHARACTERISTIC_OS constants for ease of use
Aug 11, 2021
06f8943
features: add format/pe and format/elf characteristics
Aug 11, 2021
20859d2
extractors: pefile: extract OS and format
Aug 11, 2021
97092c9
tests: assert absence of the wrong os/format
Aug 11, 2021
753b003
pep8
Aug 11, 2021
05f8e24
fixtures: add tests demonstrating extraction of features from ELF files
Aug 11, 2021
baaa8ba
scripts: add script to detect ELF OS
Aug 11, 2021
37bc47c
extractors: viv: extract from bytes not file path
Aug 11, 2021
7205862
helpers: move ELF and IDA helpers out of script and into common module
Aug 11, 2021
fa8b4a4
extractors: add common routine to extract OS from ELF
Aug 11, 2021
294f74b
extractors: viv: extract format and OS at all scopes
Aug 11, 2021
a7678e7
extractors: smda: extract format and OS characteristics at all scopes
Aug 11, 2021
769d354
detect-elf-os: remove extra print statement
Aug 11, 2021
c1910d4
move is_global_feature into capa.features.common
Aug 11, 2021
71d9ebd
extractors: ida: extract OS and file format characteristics at all sc…
Aug 11, 2021
34819b2
pep8
Aug 11, 2021
30d7425
changelog
Aug 11, 2021
d5c9a5c
mypy: ignore ida_loader
Aug 11, 2021
f013815
features: rename legacy term `arch` to `bitness`
Aug 16, 2021
ab1326f
features: move OS and Format to their own features, not characteristics
Aug 16, 2021
5405e18
features: move Format features to file scope
Aug 16, 2021
738fa91
fixtures: update tests to account for Format scope
Aug 16, 2021
8e689c3
features: add Arch feature at global scope
Aug 16, 2021
fd47b03
render: vverbose: don't render locations of global scope features
Aug 16, 2021
98c00bd
extractors: add missing global_.py files
Aug 16, 2021
0065876
extractors: ida: move os extraction to global module
Aug 17, 2021
92dfa99
extractors: log unsupported os/arch/format but don't except
Aug 17, 2021
909ffc1
Merge branch 'master' into feature-701
williballenthin Aug 17, 2021
ac5d163
pep8
Aug 17, 2021
0c3a38b
Merge branch 'feature-701' of github.com:fireeye/capa into feature-701
Aug 17, 2021
f1df29d
tests: xfail smda ELF API
Aug 18, 2021
a35f5a1
elf: detect FreeBSD via note
Aug 18, 2021
8960358
elf: add some doc
Aug 18, 2021
766ac7e
Merge branch 'master' of github.com:fireeye/capa into feature-701
Aug 18, 2021
249b849
pefile: extract Arch
Aug 18, 2021
e124115
Merge branch 'master' into feature-701
williballenthin Aug 18, 2021
f0a34fd
merge
Aug 18, 2021
cf17eba
Merge branch 'feature-701' of github.com:fireeye/capa into feature-701
Aug 18, 2021
45b6c8d
setup: bump SMDA dep ver
Aug 19, 2021
a96a5de
tests: re-enable SMDA ELF API tests
Aug 19, 2021
3cb7573
enable os/arch/format for capa explorer
mike-hunhoff Aug 19, 2021
dae7be0
elf: fix alignment calculation
williballenthin Aug 19, 2021
04cc94a
main: detect invalid arch and os
Aug 20, 2021
3eaeb53
Merge branch 'feature-701' of github.com:fireeye/capa into feature-701
Aug 20, 2021
aef03b5
elf: fix type error caught by mypy!
Aug 20, 2021
1b9a6c3
main: collect os/format/arch into metadata and render it
Aug 20, 2021
b6ab12d
Update capa/features/common.py
williballenthin Aug 23, 2021
a90e93e
Update capa/main.py
williballenthin Aug 23, 2021
2ba000a
Merge branch 'master' into feature-701
williballenthin Aug 23, 2021
c0fe042
changelog: tweak PR ref
Aug 23, 2021
6961fde
Merge branch 'feature-701' of github.com:fireeye/capa into feature-701
Aug 23, 2021
a1bf95e
features: formatting of OS constants
Aug 23, 2021
6482f67
elf: document unused OS constants
Aug 23, 2021
dab88e4
elf: add more explanation about ei_osabi
Aug 23, 2021
a729bdf
elf: more clearly set first detected OS
Aug 23, 2021
30a5493
tests: smda: remove unused import
Aug 23, 2021
fc73787
extractors: file extractor arg consistency via kwargs
Aug 23, 2021
a4b0954
viv: ignore mypy FP
Aug 23, 2021
56f9e16
tests: viv: disable ELF tests due to #735
Aug 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/mypy/mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ ignore_missing_imports = True
[mypy-ida_funcs.*]
ignore_missing_imports = True

[mypy-ida_loader.*]
ignore_missing_imports = True

[mypy-PyQt5.*]
ignore_missing_imports = True

Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,14 @@
- explorer: enforce max column width Features and Editor panes #691 @mike-hunhoff
- explorer: add option to limit features to currently selected disassembly address #692 @mike-hunhoff
- all: add support for ELF files #700 @Adir-Shemesh @TcM1911
- rule format: add feature `format: ` for file format, like `format: pe` @williballenthin
- rule format: add feature `arch: ` for architecture, like `arch: amd64` @williballenthin
- rule format: add feature `os: ` for operating system, like `os: windows` #701 @williballenthin

### Breaking Changes

- legacy term `arch` (i.e., "x32") is now called `bitness` @williballenthin

### New Rules (24)

- collection/webcam/capture-webcam-image johnk3r
Expand Down
90 changes: 71 additions & 19 deletions capa/features/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,14 @@

import capa.engine
import capa.features
import capa.features.extractors.elf

logger = logging.getLogger(__name__)
MAX_BYTES_FEATURE_SIZE = 0x100

# thunks may be chained so we specify a delta to control the depth to which these chains are explored
THUNK_CHAIN_DEPTH_DELTA = 5

# identifiers for supported architectures names that tweak a feature
# for example, offset/x32
ARCH_X32 = "x32"
ARCH_X64 = "x64"
VALID_ARCH = (ARCH_X32, ARCH_X64)


def bytes_to_str(b: bytes) -> str:
return str(codecs.encode(b, "hex").decode("utf-8"))
Expand All @@ -52,33 +47,33 @@ def escape_string(s: str) -> str:


class Feature:
def __init__(self, value: Union[str, int, bytes], arch=None, description=None):
def __init__(self, value: Union[str, int, bytes], bitness=None, description=None):
"""
Args:
value (any): the value of the feature, such as the number or string.
arch (str): one of the VALID_ARCH values, or None.
When None, then the feature applies to any architecture.
Modifies the feature name from `feature` to `feature/arch`, like `offset/x32`.
bitness (str): one of the VALID_BITNESS values, or None.
When None, then the feature applies to any bitness.
Modifies the feature name from `feature` to `feature/bitness`, like `offset/x32`.
description (str): a human-readable description that explains the feature value.
"""
super(Feature, self).__init__()

if arch is not None:
if arch not in VALID_ARCH:
raise ValueError("arch '%s' must be one of %s" % (arch, VALID_ARCH))
self.name = self.__class__.__name__.lower() + "/" + arch
if bitness is not None:
if bitness not in VALID_BITNESS:
raise ValueError("bitness '%s' must be one of %s" % (bitness, VALID_BITNESS))
self.name = self.__class__.__name__.lower() + "/" + bitness
else:
self.name = self.__class__.__name__.lower()

self.value = value
self.arch = arch
self.bitness = bitness
self.description = description

def __hash__(self):
return hash((self.name, self.value, self.arch))
return hash((self.name, self.value, self.bitness))

def __eq__(self, other):
return self.name == other.name and self.value == other.value and self.arch == other.arch
return self.name == other.name and self.value == other.value and self.bitness == other.bitness

def get_value_str(self) -> str:
"""
Expand All @@ -105,8 +100,8 @@ def evaluate(self, ctx: Dict["Feature", Set[int]]) -> "capa.engine.Result":
return capa.engine.Result(self in ctx, self, [], locations=ctx.get(self, []))

def freeze_serialize(self):
if self.arch is not None:
return (self.__class__.__name__, [self.value, {"arch": self.arch}])
if self.bitness is not None:
return (self.__class__.__name__, [self.value, {"bitness": self.bitness}])
else:
return (self.__class__.__name__, [self.value])

Expand All @@ -131,6 +126,7 @@ def __init__(self, value: str, description=None):

class Characteristic(Feature):
def __init__(self, value: str, description=None):

super(Characteristic, self).__init__(value, description=description)


Expand Down Expand Up @@ -260,3 +256,59 @@ def freeze_serialize(self):
@classmethod
def freeze_deserialize(cls, args):
return cls(*[codecs.decode(x, "hex") for x in args])


# identifiers for supported bitness names that tweak a feature
# for example, offset/x32
BITNESS_X32 = "x32"
BITNESS_X64 = "x64"
VALID_BITNESS = (BITNESS_X32, BITNESS_X64)


# other candidates here: https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types
ARCH_I386 = "i386"
ARCH_AMD64 = "amd64"
VALID_ARCH = (ARCH_I386, ARCH_AMD64)


class Arch(Feature):
def __init__(self, value: str, description=None):
assert value in VALID_ARCH
super(Arch, self).__init__(value, description=description)
self.name = "arch"


OS_WINDOWS = "windows"
OS_LINUX = "linux"
OS_MACOS = "macos"
VALID_OS = {os.value for os in capa.features.extractors.elf.OS}
VALID_OS.add(OS_WINDOWS)
VALID_OS.add(OS_LINUX)
VALID_OS.add(OS_MACOS)


class OS(Feature):
def __init__(self, value: str, description=None):
assert value in (VALID_OS)
super(OS, self).__init__(value, description=description)
self.name = "os"


FORMAT_PE = "pe"
FORMAT_ELF = "elf"
VALID_FORMAT = (FORMAT_PE, FORMAT_ELF)


class Format(Feature):
def __init__(self, value: str, description=None):
assert value in (VALID_FORMAT)
super(Format, self).__init__(value, description=description)
self.name = "format"


def is_global_feature(feature):
"""
is this a feature that is extracted at every scope?
today, these are OS and arch features.
"""
return isinstance(feature, (OS, Arch))
78 changes: 78 additions & 0 deletions capa/features/extractors/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import io
import logging
import binascii
import contextlib

import pefile

import capa.features.extractors.elf
import capa.features.extractors.pefile
from capa.features.common import OS, FORMAT_PE, FORMAT_ELF, OS_WINDOWS, Arch, Format

logger = logging.getLogger(__name__)


def extract_format(buf):
if buf.startswith(b"MZ"):
yield Format(FORMAT_PE), 0x0
elif buf.startswith(b"\x7fELF"):
yield Format(FORMAT_ELF), 0x0
else:
# we likely end up here:
# 1. handling a file format (e.g. macho)
#
# for (1), this logic will need to be updated as the format is implemented.
logger.debug("unsupported file format: %s", binascii.hexlify(buf[:4]).decode("ascii"))
return


def extract_arch(buf):
if buf.startswith(b"MZ"):
yield from capa.features.extractors.pefile.extract_file_arch(pefile.PE(data=buf), "hack: path not provided")

elif buf.startswith(b"\x7fELF"):
with contextlib.closing(io.BytesIO(buf)) as f:
arch = capa.features.extractors.elf.detect_elf_arch(f)

if arch == "unknown":
logger.debug("unsupported arch: %s", arch)
return

yield Arch(arch), 0x0

else:
# we likely end up here:
# 1. handling shellcode, or
# 2. handling a new file format (e.g. macho)
#
# for (1) we can't do much - its shellcode and all bets are off.
# we could maybe accept a futher CLI argument to specify the arch,
# but i think this would be rarely used.
# rules that rely on arch conditions will fail to match on shellcode.
#
# for (2), this logic will need to be updated as the format is implemented.
logger.debug("unsupported file format: %s, will not guess Arch", binascii.hexlify(buf[:4]).decode("ascii"))
return


def extract_os(buf):
if buf.startswith(b"MZ"):
yield OS(OS_WINDOWS), 0x0
elif buf.startswith(b"\x7fELF"):
with contextlib.closing(io.BytesIO(buf)) as f:
os = capa.features.extractors.elf.detect_elf_os(f)

yield OS(os), 0x0
else:
# we likely end up here:
# 1. handling shellcode, or
# 2. handling a new file format (e.g. macho)
#
# for (1) we can't do much - its shellcode and all bets are off.
# we could maybe accept a futher CLI argument to specify the OS,
# but i think this would be rarely used.
# rules that rely on OS conditions will fail to match on shellcode.
#
# for (2), this logic will need to be updated as the format is implemented.
logger.debug("unsupported file format: %s, will not guess OS", binascii.hexlify(buf[:4]).decode("ascii"))
return
Loading