|  | 
|  | 1 | +List of Bugs uncovered in Rust via arithmetic overflow checking | 
|  | 2 | +=============================================================== | 
|  | 3 | +This document is a list of bugs that were uncovered during the | 
|  | 4 | +implementation and deployment of arithmetic overflow checking. | 
|  | 5 | +This list is restricted solely to *legitimate* bugs. Cases | 
|  | 6 | +where the overflow was benign (e.g. the computed value is | 
|  | 7 | +unused), transient (e.g. the computed wrapped value is | 
|  | 8 | +guaranteed to be brought back into the original range, such as | 
|  | 9 | +in `unsigned - 1 + provably_    tpositive`), or silly (random | 
|  | 10 | +non-functional code in the tests or documentation) are not | 
|  | 11 | +included in the list. | 
|  | 12 | +However, extremely rare or obscure corner cases are considered | 
|  | 13 | +legitimate bugs. (We begin with such a case.) | 
|  | 14 | + | 
|  | 15 | + 1. `impl core::iter::RandomAccessIter for core::iter::Rev` | 
|  | 16 | + | 
|  | 17 | +    if one calls the `iter.idx(index)` with `index <= amt`, | 
|  | 18 | +    then it calls the wrapped inner iterstor with a wrapped | 
|  | 19 | +    around value. The contract for `idx` does say that it | 
|  | 20 | +    does need to handle out-of-bounds inputs, so this | 
|  | 21 | +    appeared benign at first, but there is the corner case | 
|  | 22 | +    of an iterator that actually covers the whole range | 
|  | 23 | +    of indices, which would then return Some(_) here when | 
|  | 24 | +    (pnkfelix thinks) None should be expected. | 
|  | 25 | + | 
|  | 26 | +    reference: | 
|  | 27 | +    https://github.com/rust-lang/rust/pull/22532#issuecomment-75168901 | 
|  | 28 | + | 
|  | 29 | + 2. `std::sys::windows::time::SteadyTime` | 
|  | 30 | + | 
|  | 31 | +    `fn ns` was converting a tick count `t` to nanoseconds | 
|  | 32 | +    via the computation `t * 1_000_000_000 / frequency()`; | 
|  | 33 | +    but the multiplication there can overflow, thus losing | 
|  | 34 | +    the high-order bits. | 
|  | 35 | + | 
|  | 36 | +    Full disclosure: This bug was known prior to landing | 
|  | 37 | +    arithmetic overflow checks, and filed as: | 
|  | 38 | + | 
|  | 39 | +    https://github.com/rust-lang/rust/issues/17845 | 
|  | 40 | + | 
|  | 41 | +    Despite being filed, it was left unfixed for months, | 
|  | 42 | +    despite the fact that the overflow would start | 
|  | 43 | +    occurring after 2 hours of machine uptime, according to: | 
|  | 44 | + | 
|  | 45 | +    https://github.com/rust-lang/rust/pull/22788 | 
|  | 46 | + | 
|  | 47 | +    pnkfelix included it on this list because having arithmetic | 
|  | 48 | +    overflow forces such bugs to be fixed in some manner | 
|  | 49 | +    rather than ignored. | 
|  | 50 | + | 
|  | 51 | + 3. `std::rt::lang_start` | 
|  | 52 | +    The runtime startup uses a fairly loose computation to | 
|  | 53 | +    determine the stack extent to pass to | 
|  | 54 | +    record_os_managed_stack_bounds (which sets up guard | 
|  | 55 | +    pages and fault handlers to deal with call stack over- | 
|  | 56 | +    or underflows). | 
|  | 57 | + | 
|  | 58 | +    In this case, the arithmetic involved was actually | 
|  | 59 | +    *overflowing*, in this calculation: | 
|  | 60 | + | 
|  | 61 | +    ``` | 
|  | 62 | +    let top_plus_20k = my_stack_top + 20000; | 
|  | 63 | +    ``` | 
|  | 64 | +
 | 
|  | 65 | +    pnkfelix assumes that in practice this would lead to us | 
|  | 66 | +    attempting to install a guard page starting from some | 
|  | 67 | +    random location, rather than the actual desired | 
|  | 68 | +    address range. While the lack of a guard page in the | 
|  | 69 | +    right spot is probably of no consequence here (assuming | 
|  | 70 | +    that the OS is already going to stop us from actually | 
|  | 71 | +    attempting to write to stack locations resulting from | 
|  | 72 | +    overflow if that ever occurs), attempting to install a | 
|  | 73 | +    guard page on a random unrelated address range seems | 
|  | 74 | +    completely bogus. | 
|  | 75 | +    pnkfelix only observed this bug when building a 32-bit | 
|  | 76 | +    Rust on a 64-bit Linux host via cross-compilation. | 
|  | 77 | +
 | 
|  | 78 | +    So, probably qualifies a rare bug. | 
|  | 79 | +    reference: | 
|  | 80 | +
 | 
|  | 81 | +    https://github.com/rust-lang/rust/pull/22532#issuecomment-76927295 | 
|  | 82 | +
 | 
|  | 83 | +    UPDATE: In hindsight, one might argue this should be | 
|  | 84 | +    reclassified as a transient overflow, because the whole  | 
|  | 85 | +    computation in context is: | 
|  | 86 | +
 | 
|  | 87 | +    ``` | 
|  | 88 | +    let my_stack_bottom = | 
|  | 89 | +        my_stack_top + 20000 - OS_DEFAULT_STACK_ESTIMATE; | 
|  | 90 | +    ``` | 
|  | 91 | +
 | 
|  | 92 | +    where OS_DEFAULT_STACK_ESTIMATE is a large value | 
|  | 93 | +    (> 1mb). | 
|  | 94 | +
 | 
|  | 95 | +    However, my claim is that this code is playing guessing | 
|  | 96 | +    games; do we really know that the stack is sufficiently | 
|  | 97 | +    large that the computation above does not *underflow*? | 
|  | 98 | +
 | 
|  | 99 | +    So pnkfelix is going to leave it on this list, at least | 
|  | 100 | +    for now. (pnkfelix subsequently changed the code to use | 
|  | 101 | +    saturated arithmetic in both cases, though obviously | 
|  | 102 | +    that could be tweaked a bit.) | 
|  | 103 | + 4. struct order of evaluation | 
|  | 104 | +
 | 
|  | 105 | +    There is an explanatory story here: | 
|  | 106 | +
 | 
|  | 107 | +    https://github.com/rust-lang/rust/issues/23112 | 
|  | 108 | +
 | 
|  | 109 | +    In short, one of our tests was quite weak and not | 
|  | 110 | +    actually checking the computed values. But | 
|  | 111 | +    arithmetic-overflow checking immediately pointed | 
|  | 112 | +    out an attempt to reserve a ridiculous amount | 
|  | 113 | +    of space within a `Vec`. (This was on an experimental | 
|  | 114 | +    branch of the codebase where we would fill with | 
|  | 115 | +    a series of 0xC1 bytes when a value was dropped, rather | 
|  | 116 | +    than filling with 0x00 bytes.) | 
|  | 117 | +
 | 
|  | 118 | +    It is actually quite likely that this test would still | 
|  | 119 | +    have failed without the arithmetic overflow checking, | 
|  | 120 | +    but it probably would have been much harder to diagnose | 
|  | 121 | +    since the panic would have happened at some arbitrary | 
|  | 122 | +    point later in the control flow. | 
0 commit comments