-
Couldn't load subscription status.
- Fork 13.9k
Description
Cachegrind profiles indicate that the Rust compiler often spends 3-6% of its executed instructions within memcpy (specifically __memcpy_avx_unaligned_erms on my Linux box), which is pretty incredible.
I have modified DHAT to track memcpy/memmove calls and have discovered that a lot are caused by obligation types, such as PendingPredicateObligations and PendingObligations, which are quite large (160 bytes and 136 bytes respectively on my Linux64 machine).
For example, for the keccak benchmark, 33% of the copied bytes occur in the swap call in the compress function:
rust/src/librustc_data_structures/obligation_forest/mod.rs
Lines 607 to 620 in a6624ed
| // Now move all popped nodes to the end. Try to keep the order. | |
| // | |
| // LOOP INVARIANT: | |
| // self.nodes[0..i - dead_nodes] are the first remaining nodes | |
| // self.nodes[i - dead_nodes..i] are all dead | |
| // self.nodes[i..] are unchanged | |
| for i in 0..self.nodes.len() { | |
| match self.nodes[i].state.get() { | |
| NodeState::Pending | NodeState::Waiting => { | |
| if dead_nodes > 0 { | |
| self.nodes.swap(i, i - dead_nodes); | |
| node_rewrites[i] -= dead_nodes; | |
| } | |
| } |
For serde, 11% of the copied bytes occur constructing this vector of obligations:
Lines 150 to 157 in a6624ed
| self.out.iter() | |
| .inspect(|pred| assert!(!pred.has_escaping_bound_vars())) | |
| .flat_map(|pred| { | |
| let mut selcx = traits::SelectionContext::new(infcx); | |
| let pred = traits::normalize(&mut selcx, param_env, cause.clone(), pred); | |
| once(pred.value).chain(pred.obligations) | |
| }) | |
| .collect() |
and 5% occur appending to this vector of obligations:
rust/src/librustc/traits/project.rs
Lines 570 to 574 in ac21131
| obligations.push(get_paranoid_cache_value_obligation(infcx, | |
| param_env, | |
| projection_ty, | |
| cause, | |
| depth)); |
It also looks like some functions such as FulfillmentContext::register_predicate_obligation() might be passed a PredicateObligation by value (using a memcpy) rather than by reference, though I'm not sure about that.
I have some ideas to shrink these types a little, and improve how they're used, but these changes will be tinkering around the edges. It's possible that more fundamental changes to how the obligation system works could elicit bigger wins.