|  | 
|  | 1 | +# Dataflow Analysis | 
|  | 2 | + | 
|  | 3 | +If you work on the MIR, you will frequently come across various flavors of | 
|  | 4 | +[dataflow analysis][wiki]. For example, `rustc` uses dataflow to find | 
|  | 5 | +uninitialized variables, determine what variables are live across a generator | 
|  | 6 | +`yield` statement, and compute which `Place`s are borrowed at a given point in | 
|  | 7 | +the control-flow graph. Dataflow analysis is a fundamental concept in modern | 
|  | 8 | +compilers, and knowledge of the subject will be helpful to prospective | 
|  | 9 | +contributors. | 
|  | 10 | + | 
|  | 11 | +However, this documentation is not a general introduction to dataflow analysis. | 
|  | 12 | +It is merely a description of the framework used to define these analyses in | 
|  | 13 | +`rustc`. It assumes that the reader is familiar with some basic terminology, | 
|  | 14 | +such as "transfer function", "fixpoint" and "lattice". If you're unfamiliar | 
|  | 15 | +with these terms, or if you want a quick refresher, [*Static Program Analysis*] | 
|  | 16 | +by Anders Møller and Michael I. Schwartzbach is an excellent, freely available | 
|  | 17 | +textbook.  For those who prefer audiovisual learning, the Goethe University | 
|  | 18 | +Frankfurt has published a series of short [youtube lectures][goethe] in English | 
|  | 19 | +that are very approachable. | 
|  | 20 | + | 
|  | 21 | +## Defining a Dataflow Analysis | 
|  | 22 | + | 
|  | 23 | +The interface for dataflow analyses is split into three traits. The first is | 
|  | 24 | +[`AnalysisDomain`], which must be implemented by *all* analyses. In addition to | 
|  | 25 | +the type of the dataflow state, this trait defines the initial value of that | 
|  | 26 | +state at entry to each block, as well as the direction of the analysis, either | 
|  | 27 | +forward or backward. The domain of your dataflow analysis must be a [lattice][] | 
|  | 28 | +(strictly speaking a join-semilattice) with a well-behaved `join` operator. See | 
|  | 29 | +documentation for the [`lattice`] module, as well as the [`JoinSemiLattice`] | 
|  | 30 | +trait, for more information. | 
|  | 31 | + | 
|  | 32 | +You must then provide *either* a direct implementation of the [`Analysis`] trait | 
|  | 33 | +*or* an implementation of the proxy trait [`GenKillAnalysis`]. The latter is for | 
|  | 34 | +so-called ["gen-kill" problems], which have a simple class of transfer function | 
|  | 35 | +that can be applied very efficiently. Analyses whose domain is not a `BitSet` | 
|  | 36 | +of some index type, or whose transfer functions cannot be expressed through | 
|  | 37 | +"gen" and "kill" operations, must implement `Analysis` directly, and will run | 
|  | 38 | +slower as a result. All implementers of `GenKillAnalysis` also implement | 
|  | 39 | +`Analysis` automatically via a default `impl`. | 
|  | 40 | + | 
|  | 41 | + | 
|  | 42 | +```text | 
|  | 43 | + AnalysisDomain | 
|  | 44 | +       ^ | 
|  | 45 | +       |          | = has as a supertrait | 
|  | 46 | +       |          . = provides a default impl for | 
|  | 47 | +       | | 
|  | 48 | +   Analysis | 
|  | 49 | +     ^   ^ | 
|  | 50 | +     |   . | 
|  | 51 | +     |   . | 
|  | 52 | +     |   . | 
|  | 53 | + GenKillAnalysis | 
|  | 54 | +
 | 
|  | 55 | +``` | 
|  | 56 | + | 
|  | 57 | +### Transfer Functions and Effects | 
|  | 58 | + | 
|  | 59 | +The dataflow framework in `rustc` allows each statement inside a basic block as | 
|  | 60 | +well as the terminator to define its own transfer function. For brevity, these | 
|  | 61 | +individual transfer functions are known as "effects". Each effect is applied | 
|  | 62 | +successively in dataflow order, and together they define the transfer function | 
|  | 63 | +for the entire basic block. It's also possible to define an effect for | 
|  | 64 | +particular outgoing edges of some terminators (e.g. | 
|  | 65 | +[`apply_call_return_effect`] for the `success` edge of a `Call` | 
|  | 66 | +terminator). Collectively, these are known as per-edge effects. | 
|  | 67 | + | 
|  | 68 | +The only meaningful difference (besides the "apply" prefix) between the methods | 
|  | 69 | +of the `GenKillAnalysis` trait and the `Analysis` trait is that an `Analysis` | 
|  | 70 | +has direct, mutable access to the dataflow state, whereas a `GenKillAnalysis` | 
|  | 71 | +only sees an implementer of the `GenKill` trait, which only allows the `gen` | 
|  | 72 | +and `kill` operations for mutation. | 
|  | 73 | + | 
|  | 74 | +Observant readers of the documentation for these traits may notice that there | 
|  | 75 | +are actually *two* possible effects for each statement and terminator, the | 
|  | 76 | +"before" effect and the unprefixed (or "primary") effect. The "before" effects | 
|  | 77 | +are applied immediately before the unprefixed effect **regardless of whether | 
|  | 78 | +the analysis is backward or forward**. The vast majority of analyses should use | 
|  | 79 | +only the unprefixed effects: Having multiple effects for each statement makes | 
|  | 80 | +it difficult for consumers to know where they should be looking. However, the | 
|  | 81 | +"before" variants can be useful in some scenarios, such as when the effect of | 
|  | 82 | +the right-hand side of an assignment statement must be considered separately | 
|  | 83 | +from the left-hand side. | 
|  | 84 | + | 
|  | 85 | +### Convergence | 
|  | 86 | + | 
|  | 87 | +TODO | 
|  | 88 | + | 
|  | 89 | +## Inspecting the Results of a Dataflow Analysis | 
|  | 90 | + | 
|  | 91 | +Once you have constructed an analysis, you must pass it to an [`Engine`], which | 
|  | 92 | +is responsible for finding the steady-state solution to your dataflow problem. | 
|  | 93 | +You should use the [`into_engine`] method defined on the `Analysis` trait for | 
|  | 94 | +this, since it will use the more efficient `Engine::new_gen_kill` constructor | 
|  | 95 | +when possible. | 
|  | 96 | + | 
|  | 97 | +Calling `iterate_to_fixpoint` on your `Engine` will return a `Results`, which | 
|  | 98 | +contains the dataflow state at fixpoint upon entry of each block. Once you have | 
|  | 99 | +a `Results`, you can can inspect the dataflow state at fixpoint at any point in | 
|  | 100 | +the CFG. If you only need the state at a few locations (e.g., each `Drop` | 
|  | 101 | +terminator) use a [`ResultsCursor`]. If you need the state at *every* location, | 
|  | 102 | +a [`ResultsVisitor`] will be more efficient. | 
|  | 103 | + | 
|  | 104 | +```text | 
|  | 105 | +                         Analysis | 
|  | 106 | +                            | | 
|  | 107 | +                            | into_engine(…) | 
|  | 108 | +                            | | 
|  | 109 | +                          Engine | 
|  | 110 | +                            | | 
|  | 111 | +                            | iterate_to_fixpoint() | 
|  | 112 | +                            | | 
|  | 113 | +                         Results | 
|  | 114 | +                         /     \ | 
|  | 115 | + into_results_cursor(…) /       \  visit_with(…) | 
|  | 116 | +                       /         \ | 
|  | 117 | +               ResultsCursor  ResultsVisitor | 
|  | 118 | +``` | 
|  | 119 | + | 
|  | 120 | +For example, the following code uses a [`ResultsVisitor`]... | 
|  | 121 | + | 
|  | 122 | + | 
|  | 123 | +```rust,ignore | 
|  | 124 | +// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = MyAnalysis::Domain>`... | 
|  | 125 | +let my_visitor = MyVisitor::new(); | 
|  | 126 | +
 | 
|  | 127 | +// inspect the fixpoint state for every location within every block in RPO. | 
|  | 128 | +let results = MyAnalysis() | 
|  | 129 | +    .into_engine(tcx, body, def_id) | 
|  | 130 | +    .iterate_to_fixpoint() | 
|  | 131 | +    .visit_with(body, traversal::reverse_postorder(body), &mut my_visitor); | 
|  | 132 | +``` | 
|  | 133 | + | 
|  | 134 | +whereas this code uses [`ResultsCursor`]: | 
|  | 135 | + | 
|  | 136 | +```rust,ignore | 
|  | 137 | +let mut results = MyAnalysis() | 
|  | 138 | +    .into_engine(tcx, body, def_id) | 
|  | 139 | +    .iterate_to_fixpoint() | 
|  | 140 | +    .into_results_cursor(body); | 
|  | 141 | +
 | 
|  | 142 | +// Inspect the fixpoint state immediately before each `Drop` terminator. | 
|  | 143 | +for (bb, block) in body.basic_blocks().iter_enumerated() { | 
|  | 144 | +    if let TerminatorKind::Drop { .. } = block.terminator().kind { | 
|  | 145 | +        results.seek_before_primary_effect(body.terminator_loc(bb)); | 
|  | 146 | +        let state = results.get(); | 
|  | 147 | +        println!("state before drop: {:#?}", state); | 
|  | 148 | +    } | 
|  | 149 | +} | 
|  | 150 | +``` | 
|  | 151 | + | 
|  | 152 | +["gen-kill" problems]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems | 
|  | 153 | +[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/ | 
|  | 154 | +[`AnalysisDomain`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.AnalysisDomain.html | 
|  | 155 | +[`Analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html | 
|  | 156 | +[`GenKillAnalysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.GenKillAnalysis.html | 
|  | 157 | +[`JoinSemiLattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/trait.JoinSemiLattice.html | 
|  | 158 | +[`ResultsCursor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/struct.ResultsCursor.html | 
|  | 159 | +[`ResultsVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.ResultsVisitor.html | 
|  | 160 | +[`apply_call_return_effect`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#tymethod.apply_call_return_effect | 
|  | 161 | +[`into_engine`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#method.into_engine | 
|  | 162 | +[`lattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/index.html | 
|  | 163 | +[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2 | 
|  | 164 | +[lattice]: https://en.wikipedia.org/wiki/Lattice_(order) | 
|  | 165 | +[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles | 
0 commit comments