Skip to content

Conversation

@vadorovsky
Copy link
Member

@vadorovsky vadorovsky commented Sep 16, 2025

Problem

Processing new epoch (Bank::process_new_epoch) involves collecting stake delegations twice:

  1. In Stakes::activate_epoch, to create a stake history entry and refresh vote accounts.
  2. In Bank::filter_stake_delegations, which is then used in Bank::calculate_stake_vote_rewards to calculate rewards for stakers and voters.

The overall time of crossing the epoch boundary is ~519ms:

update_epoch_us=519953i

Where the two heaviest operations are collect() calls on stake delegations, each of them taking ~200-220ms:

before_0 before_1

Summary of Changes

Reduce that to just one collect to a Vec<(&Pubkey, &StakeAccount)> done on the beginning of Bank::process_new_epoch and passing the stake delegations to the other methods.

That vector holds references to the stake cache information, which stays behind read-write lock. To make sure that we hold only read lock for the lifetime of the vector, split all operations requiring mutation of Bank and acquiring a write lock out and perform them after dropping the read lock.

In summary, the new order of operations done in Bank::process_new_epoch is:

  1. Acquisition of a read lock on stakes_cache.
    a) Bank::begin_epoch_activation (a new method) that returns updated stake_history and vote_accounts (without modyfing the Bank yet).
    b) Bank::calculate_rewards_and_distribute_vote_rewards, which computes the stake rewards. Pass stake_history, stake_delegations and vote_accounts. Keep the result as rewards_result.
  2. Acquisition of a write lock on stakes_cache.
    a) StakesCache::activate_epoch, which now instead of performing any calculations, simply takes the previously computed stake_history and vote_accounts and assigns them.
  3. Bank::save_rewards (a new method) that takes rewards_result and uses it to update epoch_reward_status and the EpochRewards sysvar.

The new time of crossing the epoch boundary is ~337ms:

update_epoch_us=337371i

There is only one heavy collect() done on stake delegations, which still takes the most of main thread's time. But that's the best we can do while still using im::HashMap.

after_collect

Fixes: #8282

@vadorovsky vadorovsky force-pushed the epoch-one-iteration branch 5 times, most recently from 2b1439a to 5525c4f Compare September 23, 2025 11:54
@vadorovsky vadorovsky changed the title runtime: Iterate over stake delegations only once during epoch activation runtime: Collect stake delegations only once during epoch activation Sep 29, 2025
@vadorovsky vadorovsky force-pushed the epoch-one-iteration branch 12 times, most recently from b52cfbf to 3b50554 Compare October 3, 2025 09:59
@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.2%. Comparing base (222485e) to head (4f7bb25).
⚠️ Report is 34 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #8065    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         863      863            
  Lines      373941   374132   +191     
========================================
+ Hits       311207   311404   +197     
+ Misses      62734    62728     -6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vadorovsky vadorovsky marked this pull request as ready for review October 3, 2025 10:50
@vadorovsky vadorovsky force-pushed the epoch-one-iteration branch from 3b50554 to d5159c1 Compare October 4, 2025 08:15
@HaoranYi
Copy link

HaoranYi commented Oct 6, 2025

There is an issue with this PR for epoch_reward_cache.

The PR moved the cache check after the computation. Before the PR, the cache was checked before computing rewards in calculate_rewards_and_distribute_vote_rewards. After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

@vadorovsky
Copy link
Member Author

vadorovsky commented Oct 6, 2025

After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

And your worry is that it will take more than one slot? Or is there something else you have in mind?

To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot.

@HaoranYi
Copy link

HaoranYi commented Oct 7, 2025

After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

And your worry is that it will take more than one slot? Or is there something else you have in mind?

To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot.

Yes. we used to have many forks at epoch boundary. And the cache is introduced to avoid computing the rewards again at forks. If we are certain that there is going to be no forks, we can remove the cache. In this Pr, we store to the cache but never read from it. seems a waste.

@brooksprumo
Copy link

@HaoranYi @jstarry Can y'all re-review, please?

HaoranYi
HaoranYi previously approved these changes Oct 21, 2025
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!
LGTM.

Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this pretty difficult to review due to how large the diff is (~850 lines). Ideally you don't move functions around and refactor them in the same PR. And I think some of the changes aren't necessary, left comments for those.

}

// Calculate rewards from previous epoch and distribute vote rewards
pub(in crate::bank) fn calculate_rewards_and_distribute_vote_rewards(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually distributing the vote rewards anymore, so this function should be renamed to something like calculate_and_cache_epoch_rewards. The CalculateRewardsAndDistributeVoteRewardsResult struct should be renamed as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function didn't get renamed yet

Copy link
Member Author

@vadorovsky vadorovsky Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because I didn't end up splitting calculation and distribution in the current version of this code, I've realized that refactor was unnecessary. The distribution part is still there in the same method:

// verify that we didn't pay any more than we expected to
assert!(point_value.rewards >= total_vote_rewards + total_stake_rewards_lamports);
info!(
"distributed vote rewards: {} out of {}, remaining {}",
total_vote_rewards, point_value.rewards, total_stake_rewards_lamports
);
let (num_stake_accounts, num_vote_accounts) = {
let stakes = self.stakes_cache.stakes();
(
stakes.stake_delegations().len(),
stakes.vote_accounts().len(),
)
};
self.capitalization.fetch_add(total_vote_rewards, Relaxed);
let active_stake = if let Some(stake_history_entry) =
self.stakes_cache.stakes().history().get(prev_epoch)
{
stake_history_entry.effective
} else {
0
};

And the only thing I had to move away in order to change &mut self to &self is the self.set_epoch_reward_status_calculation(distribution_starting_block_height, stake_rewards); call.

14c000f#diff-802ffb9b4536fc89679c0834342ccaa0d32bb482a85e846a444e491a93e684e5
(this commit is a self-contained change of mutability there)

Calling self.capitalization.fetch_add and logging active stake is still fine even after changing &mut self to &self.

Copy link

@jstarry jstarry Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well you also split out Bank::store_vote_accounts_partitioned (also inside Bank::save_rewards) from Bank::calculate_rewards_and_distribute_vote_rewards so the core part of vote reward distribution is actually not in there. But I see your point about the other distribution code being in there still. Do you think we could move all of that code into Bank::save_rewards (maybe rename this to distribute_vote_rewards) so that all the code for vote reward distribution is in the same place?

Specifically:

  • Bank::update_vote_rewards
  • Capitalization update

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

Copy link
Member Author

@vadorovsky vadorovsky Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I 'm done with the split to Bank::calculate_rewards and Bank::distribute_vote_rewards: 9811963

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I tried to do that, but I didn't end up pushing that, because:

  1. Calling Bank::create_epoch_rewards_sysvar in begin_partitioned_rewards works (at that point, calculations are already done). Do you really feel like having it there is incorrect?
  2. It needs two other variables that live inside Bank::begin_partitioned_rewards - distribution_starting_block_height and num_partitions.
  3. The diff is pretty noisy (I even had to change the publicity of StakeRewardCalculation and import additional stuff) and doesn't seem to fit your "don't do unnecessary refactors" narrative.
  4. After such extraction, begin_partitioned_rewards only calls calculate_rewards and logs datapoints.
  5. It adds code to already big bank.rs.

See the diff: https://gist.github.com/vadorovsky/58917c7934b9636d3edce5d5cc38871e

If you really want me to extract that code, perhaps creating a yet another method inside calculation.rs would be the way. that is going to be less noisy, but also, in such case, we might think about removing Bank::begin_partitioned_rewards and doing everything in Bank::calculate_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

I kept all datapoints from the former calculate_rewards_and_distribute_vote_rewards, but split across the two methods.

In general, in this PR, I removed one datapoint - about filtering delegations - because now we just return a lazy iterator and we filter on each iteration.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Bank::create_epoch_rewards_sysvar in begin_partitioned_rewards works (at that point, calculations are already done). Do you really feel like having it there is incorrect?

My suggestion wasn't about correctness, just about making the code easier to understand. I think it's better to do read-only calculation changes separately from the changes that modify state. Methods named with "compute" and "calculate" don't seem like they would have side effects like creating a sysvar for example.

I put together a commit with some suggested refactorings here: jstarry@ed944e0

Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization update needs to be fixed and should have a test

@vadorovsky vadorovsky force-pushed the epoch-one-iteration branch 3 times, most recently from 8b44339 to 01c5aac Compare October 29, 2025 09:26
@vadorovsky vadorovsky marked this pull request as draft October 31, 2025 07:49
@vadorovsky vadorovsky force-pushed the epoch-one-iteration branch 2 times, most recently from 01b54bd to d893331 Compare November 3, 2025 17:06
@vadorovsky vadorovsky marked this pull request as ready for review November 3, 2025 17:30
@vadorovsky vadorovsky force-pushed the epoch-one-iteration branch 2 times, most recently from 0723476 to ef1f93b Compare November 4, 2025 14:21
That allows calling the method without ownership over `PointValue`.
Copies of the integers from the structure are cheap.

Currently, the validator code calling the method has an owned
`PointerValue` instance; but that's going to change in the following
changes.
Before this change, `Stakes::activate_epoch` was performing calculations
and mutating the cache at the same time. The latter was the reason why
it was taking `&mut self` and acquiring a write lock on the cache. But
the calculations themselves don't require mutability and a write lock.

Split out the new `Stakes::calculate_activated_stake` method that
performs only calculations, needs only an immutable `&self` and a read
lock. Then reduce the scope of `Stake::activate_epoch` just to assigning
the values computed by `calculate_activated_stake`.

Also, add the new `Bank::compute_new_epoch_caches_and_rewards` method,
called in `Bank::process_new_epoch`, that holds a read lock on the
stakes cache.
Stake delegations are stored as hash array mapped trie (HAMT)[0], which
means that inserts, deletions and lookups are average-case O(1) and
worst-case O(log n). However, the performance of iterations is poor due
to depth-first traversal and jumps. Currently it's also impossible to
iterate over it with rayon.

That issue is known and handled by converting the HAMT to a vector with
`stakes.stake_delegations.iter().collect()`. Move that trick to a
dedicated method that describes the performance consequences.

[0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
`filter_stake_delegations` was always collecting the stake delegations.

To allow re-using already existing vectors or slices of stake
delegations, add `FilteredStakeDelegation` type that wraps a `Cow` of
stake delegations and provides a parallel iterator that yields elements,
filtering them by the minimum stake.

Note that the wrapper yields an `Option`. That makes it possible to know
the size of the iterator without collecting elements. That property is
critical for the ability to pre-allocate data structures that contain
processed stake delegations. However, it adds a necessity of handling
`None` elements while iterating.
@vadorovsky vadorovsky requested a review from jstarry November 5, 2025 09:47
}

// Calculate rewards from previous epoch and distribute vote rewards
pub(in crate::bank) fn calculate_rewards_and_distribute_vote_rewards(
Copy link

@jstarry jstarry Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well you also split out Bank::store_vote_accounts_partitioned (also inside Bank::save_rewards) from Bank::calculate_rewards_and_distribute_vote_rewards so the core part of vote reward distribution is actually not in there. But I see your point about the other distribution code being in there still. Do you think we could move all of that code into Bank::save_rewards (maybe rename this to distribute_vote_rewards) so that all the code for vote reward distribution is in the same place?

Specifically:

  • Bank::update_vote_rewards
  • Capitalization update

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

`Bank::begin_partitioned_rewards` was taking `&mut self`, even though
all the calculations done by it do not require mutability and the only
LOC requiring it was a call to `set_epoch_reward_status_calculation`.

`Bank::calculate_rewards_and_distribute_vote_rewards` was calling
`Bank::store_vote_accounts_partitioned` that acquires a write lock on
stakes cache.

To make a clear boundary between the code that:

* Does not mutate and acquires only read locks.
* Mutates and acquires write locks.

Split the mentioned code into two methods:

* `Bank::calculate_rewards` - takes `&self`, does not acquire any locks.
  However, we are going to pass a slice of delegations, that acquires
  a read lock on status cache, in the follow up changes there.
* `Bank::distribute_vote_rewards` - takes `&mut self` and acquires a
  write lock on status cache.
…ake`

`Bank::calculate_activated_stake` was collecting stake delegations to
a vector internally. That prevents us from re-using it in the latter
parts of epoch boundary.

Fix that by taking a slice of stake delegations.
Processing new epoch (`Bank::process_new_epoch`) involves collecting
stake delegations twice:

1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake
   history entry and refresh vote accounts.
2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used
   in `Bank::calculate_stake_vote_rewards` to calculate rewards for
   stakers and voters.

Reduce that to just one collect by passing the vector 1) with freshly
computed stake history and vote accounts to `Bank::begin_partitioned_rewards`.
This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`.
This method is now used only for recalculations after snapshot restore.
Change the name to `get_epoch_params_for_recalculation` and adjust its
documentation.
Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. I added a comment with some more suggested refactorings but this is fine as is already. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runtime: Stake delegations are collected twice during Bank::process_new_epoch

6 participants