Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 0 additions & 264 deletions Documentation/bpf/verifier.rst
Original file line number Diff line number Diff line change
Expand Up @@ -347,270 +347,6 @@ However, only the value of register ``r1`` is important to successfully finish
verification. The goal of the liveness tracking algorithm is to spot this fact
and figure out that both states are actually equivalent.

Data structures
~~~~~~~~~~~~~~~

Liveness is tracked using the following data structures::

enum bpf_reg_liveness {
REG_LIVE_NONE = 0,
REG_LIVE_READ32 = 0x1,
REG_LIVE_READ64 = 0x2,
REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
REG_LIVE_WRITTEN = 0x4,
REG_LIVE_DONE = 0x8,
};

struct bpf_reg_state {
...
struct bpf_reg_state *parent;
...
enum bpf_reg_liveness live;
...
};

struct bpf_stack_state {
struct bpf_reg_state spilled_ptr;
...
};

struct bpf_func_state {
struct bpf_reg_state regs[MAX_BPF_REG];
...
struct bpf_stack_state *stack;
}

struct bpf_verifier_state {
struct bpf_func_state *frame[MAX_CALL_FRAMES];
struct bpf_verifier_state *parent;
...
}

* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new
verifier state creation;

* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is
defined by some instruction verified between this verifier state's parent and
verifier state itself;

* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot)
is read by a some child state of this verifier state;

* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid
processing same verifier state multiple times and for some sanity checks;

* ``->live`` field values are formed by combining ``enum bpf_reg_liveness``
values using bitwise or.

Register parentage chains
~~~~~~~~~~~~~~~~~~~~~~~~~

In order to propagate information between parent and child states, a *register
parentage chain* is established. Each register or stack slot is linked to a
corresponding register or stack slot in its parent state via a ``->parent``
pointer. This link is established upon state creation in ``is_state_visited()``
and might be modified by ``set_callee_state()`` called from
``__check_func_call()``.

The rules for correspondence between registers / stack slots are as follows:

* For the current stack frame, registers and stack slots of the new state are
linked to the registers and stack slots of the parent state with the same
indices.

* For the outer stack frames, only callee saved registers (r6-r9) and stack
slots are linked to the registers and stack slots of the parent state with the
same indices.

* When function call is processed a new ``struct bpf_func_state`` instance is
allocated, it encapsulates a new set of registers and stack slots. For this
new frame, parent links for r6-r9 and stack slots are set to nil, parent links
for r1-r5 are set to match caller r1-r5 parent links.

This could be illustrated by the following diagram (arrows stand for
``->parent`` pointers)::

... ; Frame #0, some instructions
--- checkpoint #0 ---
1 : r6 = 42 ; Frame #0
--- checkpoint #1 ---
2 : call foo() ; Frame #0
... ; Frame #1, instructions from foo()
--- checkpoint #2 ---
... ; Frame #1, instructions from foo()
--- checkpoint #3 ---
exit ; Frame #1, return from foo()
3 : r1 = r6 ; Frame #0 <- current state

+-------------------------------+-------------------------------+
| Frame #0 | Frame #1 |
Checkpoint +-------------------------------+-------------------------------+
#0 | r0 | r1-r5 | r6-r9 | fp-8 ... |
+-------------------------------+
^ ^ ^ ^
| | | |
Checkpoint +-------------------------------+
#1 | r0 | r1-r5 | r6-r9 | fp-8 ... |
+-------------------------------+
^ ^ ^
|_______|_______|_______________
| | |
nil nil | | | nil nil
| | | | | | |
Checkpoint +-------------------------------+-------------------------------+
#2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
+-------------------------------+-------------------------------+
^ ^ ^ ^ ^
nil nil | | | | |
| | | | | | |
Checkpoint +-------------------------------+-------------------------------+
#3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
+-------------------------------+-------------------------------+
^ ^
nil nil | |
| | | |
Current +-------------------------------+
state | r0 | r1-r5 | r6-r9 | fp-8 ... |
+-------------------------------+
\
r6 read mark is propagated via these links
all the way up to checkpoint #1.
The checkpoint #1 contains a write mark for r6
because of instruction (1), thus read propagation
does not reach checkpoint #0 (see section below).

Liveness marks tracking
~~~~~~~~~~~~~~~~~~~~~~~

For each processed instruction, the verifier tracks read and written registers
and stack slots. The main idea of the algorithm is that read marks propagate
back along the state parentage chain until they hit a write mark, which 'screens
off' earlier states from the read. The information about reads is propagated by
function ``mark_reg_read()`` which could be summarized as follows::

mark_reg_read(struct bpf_reg_state *state, ...):
parent = state->parent
while parent:
if state->live & REG_LIVE_WRITTEN:
break
if parent->live & REG_LIVE_READ64:
break
parent->live |= REG_LIVE_READ64
state = parent
parent = state->parent

Notes:

* The read marks are applied to the **parent** state while write marks are
applied to the **current** state. The write mark on a register or stack slot
means that it is updated by some instruction in the straight-line code leading
from the parent state to the current state.

* Details about REG_LIVE_READ32 are omitted.

* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
might override the first parent link. Please refer to the comments in the
``propagate_liveness()`` and ``mark_reg_read()`` source code for further
details.

Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are
applied conservatively: stack slots are marked as written only if write size
corresponds to the size of the register, e.g. see function ``save_register_state()``.

Consider the following example::

0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8
--- checkpoint #0 ---
1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes
2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1)
3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0)

As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should
it be otherwise, the algorithm above wouldn't be able to propagate the read mark
from (3) to checkpoint #0.

Once the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is
called to update the ``->branches`` counter for each verifier state in a chain
of parent verifier states. When the ``->branches`` counter reaches zero the
verifier state becomes a valid entry in a set of cached verifier states.

Each entry of the verifier states cache is post-processed by a function
``clean_live_states()``. This function marks all registers and stack slots
without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``.
Registers/stack slots marked in this way are ignored in function ``stacksafe()``
called from ``states_equal()`` when a state cache entry is considered for
equivalence with a current state.

Now it is possible to explain how the example from the beginning of the section
works::

0: call bpf_get_prandom_u32()
1: r1 = 0
2: if r0 == 0 goto +1
3: r0 = 1
--- checkpoint[0] ---
4: r0 = r1
5: exit

* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }``
is pushed to states processing queue (pc stands for program counter).

* At instruction #4:

* ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``;
* ``checkpoint[0].r0`` is marked as written;
* ``checkpoint[0].r1`` is marked as read;

* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed
by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a
read mark and all other registers and stack slots are marked as ``NOT_INIT``
or ``STACK_INVALID``

* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue
and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states
are considered equivalent.

.. _read_marks_for_cache_hits:

Read marks propagation for cache hits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Another point is the handling of read marks when a previously verified state is
found in the states cache. Upon cache hit verifier must behave in the same way
as if the current state was verified to the program exit. This means that all
read marks, present on registers and stack slots of the cached state, must be
propagated over the parentage chain of the current state. Example below shows
why this is important. Function ``propagate_liveness()`` handles this case.

Consider the following state parentage chain (S is a starting state, A-E are
derived states, -> arrows show which state is derived from which)::

r1 read
<------------- A[r1] == 0
C[r1] == 0
S ---> A ---> B ---> exit E[r1] == 1
|
` ---> C ---> D
|
` ---> E ^
|___ suppose all these
^ states are at insn #Y
|
suppose all these
states are at insn #X

* Chain of states ``S -> A -> B -> exit`` is verified first.

* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is
propagated up to state ``A``.

* When chain of states ``C -> D`` is verified the state ``D`` turns out to be
equivalent to state ``B``.

* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state
``C`` might get mistakenly marked as equivalent to state ``E`` even though
values for register ``r1`` differ between ``C`` and ``E``.

Understanding eBPF verifier messages
====================================

Expand Down
53 changes: 27 additions & 26 deletions include/linux/bpf_verifier.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,28 +26,6 @@
/* Patch buffer size */
#define INSN_BUF_SIZE 32

/* Liveness marks, used for registers and spilled-regs (in stack slots).
* Read marks propagate upwards until they find a write mark; they record that
* "one of this state's descendants read this reg" (and therefore the reg is
* relevant for states_equal() checks).
* Write marks collect downwards and do not propagate; they record that "the
* straight-line code that reached this state (from its parent) wrote this reg"
* (and therefore that reads propagated from this state or its descendants
* should not propagate to its parent).
* A state with a write mark can receive read marks; it just won't propagate
* them to its parent, since the write mark is a property, not of the state,
* but of the link between it and its parent. See mark_reg_read() and
* mark_stack_slot_read() in kernel/bpf/verifier.c.
*/
enum bpf_reg_liveness {
REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
REG_LIVE_READ32 = 0x1, /* reg was read, so we're sensitive to initial value */
REG_LIVE_READ64 = 0x2, /* likewise, but full 64-bit content matters */
REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
REG_LIVE_WRITTEN = 0x4, /* reg was written first, screening off later reads */
REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
};

#define ITER_PREFIX "bpf_iter_"

enum bpf_iter_state {
Expand Down Expand Up @@ -212,8 +190,6 @@ struct bpf_reg_state {
* allowed and has the same effect as bpf_sk_release(sk).
*/
u32 ref_obj_id;
/* parentage chain for liveness checking */
struct bpf_reg_state *parent;
/* Inside the callee two registers can be both PTR_TO_STACK like
* R1=fp-8 and R2=fp-8, but one of them points to this function stack
* while another to the caller's stack. To differentiate them 'frameno'
Expand All @@ -226,7 +202,6 @@ struct bpf_reg_state {
* patching which only happens after main verification finished.
*/
s32 subreg_def;
enum bpf_reg_liveness live;
/* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
bool precise;
};
Expand Down Expand Up @@ -445,6 +420,7 @@ struct bpf_verifier_state {

bool speculative;
bool in_sleepable;
bool cleaned;

/* first and last insn idx of this verifier state */
u32 first_insn_idx;
Expand Down Expand Up @@ -665,6 +641,7 @@ struct bpf_subprog_info {
/* 'start' has to be the first field otherwise find_subprog() won't work */
u32 start; /* insn idx of function entry point */
u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
u32 postorder_start; /* The idx to the env->cfg.insn_postorder */
u16 stack_depth; /* max. stack depth used by this function */
u16 stack_extra;
/* offsets in range [stack_depth .. fastcall_stack_off)
Expand Down Expand Up @@ -744,6 +721,8 @@ struct bpf_scc_info {
struct bpf_scc_visit visits[];
};

struct bpf_liveness;

/* single container for all structs
* one verifier_env per bpf_check() call
*/
Expand Down Expand Up @@ -794,7 +773,10 @@ struct bpf_verifier_env {
struct {
int *insn_state;
int *insn_stack;
/* vector of instruction indexes sorted in post-order */
/*
* vector of instruction indexes sorted in post-order, grouped by subprogram,
* see bpf_subprog_info->postorder_start.
*/
int *insn_postorder;
int cur_stack;
/* current position in the insn_postorder vector */
Expand Down Expand Up @@ -842,9 +824,11 @@ struct bpf_verifier_env {
struct bpf_insn insn_buf[INSN_BUF_SIZE];
struct bpf_insn epilogue_buf[INSN_BUF_SIZE];
struct bpf_scc_callchain callchain_buf;
struct bpf_liveness *liveness;
/* array of pointers to bpf_scc_info indexed by SCC id */
struct bpf_scc_info **scc_info;
u32 scc_cnt;
u32 cs_must_writes;
};

static inline struct bpf_func_info_aux *subprog_aux(struct bpf_verifier_env *env, int subprog)
Expand Down Expand Up @@ -1065,4 +1049,21 @@ void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_verifie
void print_insn_state(struct bpf_verifier_env *env, const struct bpf_verifier_state *vstate,
u32 frameno);

struct bpf_subprog_info *bpf_find_containing_subprog(struct bpf_verifier_env *env, int off);
int bpf_jmp_offset(struct bpf_insn *insn);
int bpf_insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2]);
void bpf_fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask);
bool bpf_calls_callback(struct bpf_verifier_env *env, int insn_idx);

int bpf_stack_liveness_init(struct bpf_verifier_env *env);
void bpf_stack_liveness_free(struct bpf_verifier_env *env);
int bpf_update_live_stack(struct bpf_verifier_env *env);
int bpf_mark_stack_read(struct bpf_verifier_env *env, u32 frameno, u32 insn_idx, u64 mask);
void bpf_mark_stack_write(struct bpf_verifier_env *env, u32 frameno, u64 mask);
int bpf_reset_stack_write_marks(struct bpf_verifier_env *env, u32 insn_idx);
int bpf_commit_stack_write_marks(struct bpf_verifier_env *env);
int bpf_live_stack_query_init(struct bpf_verifier_env *env, struct bpf_verifier_state *st);
bool bpf_stack_slot_alive(struct bpf_verifier_env *env, u32 frameno, u32 spi);
void bpf_reset_live_stack_callchain(struct bpf_verifier_env *env);

#endif /* _LINUX_BPF_VERIFIER_H */
2 changes: 1 addition & 1 deletion kernel/bpf/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
endif
CFLAGS_core.o += -Wno-override-init $(cflags-nogcse-yy)

obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o liveness.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
Expand Down
Loading
Loading