runtime: Collect stake delegations only once during epoch activation #8065

vadorovsky · 2025-09-16T11:59:14Z

Problem

Processing new epoch (Bank::process_new_epoch) involves collecting stake delegations twice:

In Stakes::activate_epoch, to create a stake history entry and refresh vote accounts.
In Bank::filter_stake_delegations, which is then used in Bank::calculate_stake_vote_rewards to calculate rewards for stakers and voters.

The overall time of crossing the epoch boundary is ~519ms:

update_epoch_us=519953i

Where the two heaviest operations are collect() calls on stake delegations, each of them taking ~200-220ms:

Summary of Changes

Reduce that to just one collect to a Vec<(&Pubkey, &StakeAccount)> done on the beginning of Bank::process_new_epoch and passing the stake delegations to the other methods.

That vector holds references to the stake cache information, which stays behind read-write lock. To make sure that we hold only read lock for the lifetime of the vector, split all operations requiring mutation of Bank and acquiring a write lock out and perform them after dropping the read lock.

In summary, the new order of operations done in Bank::process_new_epoch is:

Acquisition of a read lock on stakes_cache.
a) Bank::begin_epoch_activation (a new method) that returns updated stake_history and vote_accounts (without modyfing the Bank yet).
b) Bank::calculate_rewards_and_distribute_vote_rewards, which computes the stake rewards. Pass stake_history, stake_delegations and vote_accounts. Keep the result as rewards_result.
Acquisition of a write lock on stakes_cache.
a) StakesCache::activate_epoch, which now instead of performing any calculations, simply takes the previously computed stake_history and vote_accounts and assigns them.
Bank::save_rewards (a new method) that takes rewards_result and uses it to update epoch_reward_status and the EpochRewards sysvar.

The new time of crossing the epoch boundary is ~337ms:

update_epoch_us=337371i

There is only one heavy collect() done on stake delegations, which still takes the most of main thread's time. But that's the best we can do while still using im::HashMap.

Fixes: #8282

codecov-commenter · 2025-10-03T10:41:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.2%. Comparing base (222485e) to head (4f7bb25).
⚠️ Report is 34 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #8065    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         863      863            
  Lines      373941   374132   +191     
========================================
+ Hits       311207   311404   +197     
+ Misses      62734    62728     -6

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

HaoranYi · 2025-10-06T19:02:58Z

There is an issue with this PR for epoch_reward_cache.

The PR moved the cache check after the computation. Before the PR, the cache was checked before computing rewards in calculate_rewards_and_distribute_vote_rewards. After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

vadorovsky · 2025-10-06T21:55:49Z

After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

And your worry is that it will take more than one slot? Or is there something else you have in mind?

To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot.

HaoranYi · 2025-10-07T14:22:03Z

After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

And your worry is that it will take more than one slot? Or is there something else you have in mind?

To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot.

Yes. we used to have many forks at epoch boundary. And the cache is introduced to avoid computing the rewards again at forks. If we are certain that there is going to be no forks, we can remove the cache. In this Pr, we store to the cache but never read from it. seems a waste.

brooksprumo · 2025-10-21T13:51:27Z

@HaoranYi @jstarry Can y'all re-review, please?

HaoranYi

Excellent work!
LGTM.

jstarry

I found this pretty difficult to review due to how large the diff is (~850 lines). Ideally you don't move functions around and refactor them in the same PR. And I think some of the changes aren't necessary, left comments for those.

jstarry · 2025-10-13T05:53:25Z

runtime/src/bank/partitioned_epoch_rewards/calculation.rs

+    }
+
+    // Calculate rewards from previous epoch and distribute vote rewards
+    pub(in crate::bank) fn calculate_rewards_and_distribute_vote_rewards(


This isn't actually distributing the vote rewards anymore, so this function should be renamed to something like calculate_and_cache_epoch_rewards. The CalculateRewardsAndDistributeVoteRewardsResult struct should be renamed as well.

This function didn't get renamed yet

That's because I didn't end up splitting calculation and distribution in the current version of this code, I've realized that refactor was unnecessary. The distribution part is still there in the same method:

agave/runtime/src/bank/partitioned_epoch_rewards/calculation.rs

Lines 211 to 233 in 14c000f

// verify that we didn't pay any more than we expected to

assert!(point_value.rewards >= total_vote_rewards + total_stake_rewards_lamports);

info!(

"distributed vote rewards: {} out of {}, remaining {}",

total_vote_rewards, point_value.rewards, total_stake_rewards_lamports

);

let (num_stake_accounts, num_vote_accounts) = {

let stakes = self.stakes_cache.stakes();

(

stakes.stake_delegations().len(),

stakes.vote_accounts().len(),

)

};

self.capitalization.fetch_add(total_vote_rewards, Relaxed);

let active_stake = if let Some(stake_history_entry) =

self.stakes_cache.stakes().history().get(prev_epoch)

{

stake_history_entry.effective

} else {

0

};

And the only thing I had to move away in order to change &mut self to &self is the self.set_epoch_reward_status_calculation(distribution_starting_block_height, stake_rewards); call.

14c000f#diff-802ffb9b4536fc89679c0834342ccaa0d32bb482a85e846a444e491a93e684e5
(this commit is a self-contained change of mutability there)

Calling self.capitalization.fetch_add and logging active stake is still fine even after changing &mut self to &self.

Hmm well you also split out Bank::store_vote_accounts_partitioned (also inside Bank::save_rewards) from Bank::calculate_rewards_and_distribute_vote_rewards so the core part of vote reward distribution is actually not in there. But I see your point about the other distribution code being in there still. Do you think we could move all of that code into Bank::save_rewards (maybe rename this to distribute_vote_rewards) so that all the code for vote reward distribution is in the same place?

Specifically:

Bank::update_vote_rewards

Capitalization update

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

OK, I 'm done with the split to Bank::calculate_rewards and Bank::distribute_vote_rewards: 9811963

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I tried to do that, but I didn't end up pushing that, because:

Calling Bank::create_epoch_rewards_sysvar in begin_partitioned_rewards works (at that point, calculations are already done). Do you really feel like having it there is incorrect?

It needs two other variables that live inside Bank::begin_partitioned_rewards - distribution_starting_block_height and num_partitions.

The diff is pretty noisy (I even had to change the publicity of StakeRewardCalculation and import additional stuff) and doesn't seem to fit your "don't do unnecessary refactors" narrative.

After such extraction, begin_partitioned_rewards only calls calculate_rewards and logs datapoints.

It adds code to already big bank.rs.

See the diff: https://gist.github.com/vadorovsky/58917c7934b9636d3edce5d5cc38871e

If you really want me to extract that code, perhaps creating a yet another method inside calculation.rs would be the way. that is going to be less noisy, but also, in such case, we might think about removing Bank::begin_partitioned_rewards and doing everything in Bank::calculate_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

I kept all datapoints from the former calculate_rewards_and_distribute_vote_rewards, but split across the two methods.

In general, in this PR, I removed one datapoint - about filtering delegations - because now we just return a lazy iterator and we filter on each iteration.

Calling Bank::create_epoch_rewards_sysvar in begin_partitioned_rewards works (at that point, calculations are already done). Do you really feel like having it there is incorrect?

My suggestion wasn't about correctness, just about making the code easier to understand. I think it's better to do read-only calculation changes separately from the changes that modify state. Methods named with "compute" and "calculate" don't seem like they would have side effects like creating a sysvar for example.

I put together a commit with some suggested refactorings here: jstarry@ed944e0

runtime/src/bank.rs

runtime/src/bank/partitioned_epoch_rewards/calculation.rs

runtime/src/bank.rs

runtime/src/bank/partitioned_epoch_rewards/calculation.rs

runtime/src/bank/partitioned_epoch_rewards/mod.rs

runtime/src/bank/partitioned_epoch_rewards/calculation.rs

runtime/src/bank/partitioned_epoch_rewards/mod.rs

runtime/src/stakes.rs

jstarry

Capitalization update needs to be fixed and should have a test

That allows calling the method without ownership over `PointValue`. Copies of the integers from the structure are cheap. Currently, the validator code calling the method has an owned `PointerValue` instance; but that's going to change in the following changes.

Before this change, `Stakes::activate_epoch` was performing calculations and mutating the cache at the same time. The latter was the reason why it was taking `&mut self` and acquiring a write lock on the cache. But the calculations themselves don't require mutability and a write lock. Split out the new `Stakes::calculate_activated_stake` method that performs only calculations, needs only an immutable `&self` and a read lock. Then reduce the scope of `Stake::activate_epoch` just to assigning the values computed by `calculate_activated_stake`. Also, add the new `Bank::compute_new_epoch_caches_and_rewards` method, called in `Bank::process_new_epoch`, that holds a read lock on the stakes cache.

Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie

`filter_stake_delegations` was always collecting the stake delegations. To allow re-using already existing vectors or slices of stake delegations, add `FilteredStakeDelegation` type that wraps a `Cow` of stake delegations and provides a parallel iterator that yields elements, filtering them by the minimum stake. Note that the wrapper yields an `Option`. That makes it possible to know the size of the iterator without collecting elements. That property is critical for the ability to pre-allocate data structures that contain processed stake delegations. However, it adds a necessity of handling `None` elements while iterating.

runtime/src/bank.rs

jstarry · 2025-11-05T17:30:43Z

runtime/src/bank/partitioned_epoch_rewards/calculation.rs

+    }
+
+    // Calculate rewards from previous epoch and distribute vote rewards
+    pub(in crate::bank) fn calculate_rewards_and_distribute_vote_rewards(


Hmm well you also split out Bank::store_vote_accounts_partitioned (also inside Bank::save_rewards) from Bank::calculate_rewards_and_distribute_vote_rewards so the core part of vote reward distribution is actually not in there. But I see your point about the other distribution code being in there still. Do you think we could move all of that code into Bank::save_rewards (maybe rename this to distribute_vote_rewards) so that all the code for vote reward distribution is in the same place?

Specifically:

Bank::update_vote_rewards

Capitalization update

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

`Bank::begin_partitioned_rewards` was taking `&mut self`, even though all the calculations done by it do not require mutability and the only LOC requiring it was a call to `set_epoch_reward_status_calculation`. `Bank::calculate_rewards_and_distribute_vote_rewards` was calling `Bank::store_vote_accounts_partitioned` that acquires a write lock on stakes cache. To make a clear boundary between the code that: * Does not mutate and acquires only read locks. * Mutates and acquires write locks. Split the mentioned code into two methods: * `Bank::calculate_rewards` - takes `&self`, does not acquire any locks. However, we are going to pass a slice of delegations, that acquires a read lock on status cache, in the follow up changes there. * `Bank::distribute_vote_rewards` - takes `&mut self` and acquires a write lock on status cache.

…ake` `Bank::calculate_activated_stake` was collecting stake delegations to a vector internally. That prevents us from re-using it in the latter parts of epoch boundary. Fix that by taking a slice of stake delegations.

Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`.

This method is now used only for recalculations after snapshot restore. Change the name to `get_epoch_params_for_recalculation` and adjust its documentation.

jstarry

This looks correct to me. I added a comment with some more suggested refactorings but this is fine as is already. Nice work!

vadorovsky mentioned this pull request Sep 16, 2025

runtime: Avoid redundant collections of stake delegations into a vector #7770

Closed

vadorovsky force-pushed the epoch-one-iteration branch 5 times, most recently from 2b1439a to 5525c4f Compare September 23, 2025 11:54

vadorovsky force-pushed the epoch-one-iteration branch from 5525c4f to 5917723 Compare September 29, 2025 12:51

vadorovsky changed the title ~~runtime: Iterate over stake delegations only once during epoch activation~~ runtime: Collect stake delegations only once during epoch activation Sep 29, 2025

vadorovsky force-pushed the epoch-one-iteration branch 12 times, most recently from b52cfbf to 3b50554 Compare October 3, 2025 09:59

vadorovsky marked this pull request as ready for review October 3, 2025 10:50

vadorovsky requested review from HaoranYi, alessandrod, jstarry and t-nelson October 3, 2025 10:52

vadorovsky force-pushed the epoch-one-iteration branch from 3b50554 to d5159c1 Compare October 4, 2025 08:15

HaoranYi previously approved these changes Oct 21, 2025

View reviewed changes

jstarry reviewed Oct 21, 2025

View reviewed changes

jstarry requested changes Oct 21, 2025

View reviewed changes

vadorovsky dismissed HaoranYi’s stale review via 6147c4d October 28, 2025 16:17

vadorovsky force-pushed the epoch-one-iteration branch 3 times, most recently from 8b44339 to 01c5aac Compare October 29, 2025 09:26

vadorovsky marked this pull request as draft October 31, 2025 07:49

This was referenced Oct 31, 2025

runtime: Add test for epoch boundary #8801

Merged

runtime: Test epoch rewards cache for multiple forks #8802

Merged

vadorovsky force-pushed the epoch-one-iteration branch 2 times, most recently from 01b54bd to d893331 Compare November 3, 2025 17:06

vadorovsky marked this pull request as ready for review November 3, 2025 17:30

vadorovsky force-pushed the epoch-one-iteration branch 2 times, most recently from 0723476 to ef1f93b Compare November 4, 2025 14:21

vadorovsky added 4 commits November 5, 2025 09:15

vadorovsky force-pushed the epoch-one-iteration branch from ef1f93b to 8ec02be Compare November 5, 2025 08:15

vadorovsky requested a review from jstarry November 5, 2025 09:47

jstarry reviewed Nov 5, 2025

View reviewed changes

vadorovsky force-pushed the epoch-one-iteration branch from ee4584c to 1d141dd Compare November 6, 2025 13:11

vadorovsky added 4 commits November 6, 2025 14:39

runtime: Rename get_epoch_reward_calculate_param_info

4f7bb25

This method is now used only for recalculations after snapshot restore. Change the name to `get_epoch_params_for_recalculation` and adjust its documentation.

vadorovsky force-pushed the epoch-one-iteration branch from 1d141dd to 4f7bb25 Compare November 6, 2025 13:45

jstarry approved these changes Nov 6, 2025

View reviewed changes

	// verify that we didn't pay any more than we expected to
	assert!(point_value.rewards >= total_vote_rewards + total_stake_rewards_lamports);
	info!(
	"distributed vote rewards: {} out of {}, remaining {}",
	total_vote_rewards, point_value.rewards, total_stake_rewards_lamports
	);

	let (num_stake_accounts, num_vote_accounts) = {
	let stakes = self.stakes_cache.stakes();
	(
	stakes.stake_delegations().len(),
	stakes.vote_accounts().len(),
	)
	};
	self.capitalization.fetch_add(total_vote_rewards, Relaxed);

	let active_stake = if let Some(stake_history_entry) =
	self.stakes_cache.stakes().history().get(prev_epoch)
	{
	stake_history_entry.effective
	} else {
	0
	};

runtime: Collect stake delegations only once during epoch activation #8065

Are you sure you want to change the base?

runtime: Collect stake delegations only once during epoch activation #8065

Conversation

vadorovsky commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HaoranYi commented Oct 6, 2025

Uh oh!

vadorovsky commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaoranYi commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brooksprumo commented Oct 21, 2025

Uh oh!

HaoranYi left a comment

Choose a reason for hiding this comment

Uh oh!

jstarry left a comment

Choose a reason for hiding this comment

Uh oh!

jstarry Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

jstarry Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstarry Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vadorovsky Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstarry Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jstarry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jstarry Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstarry left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

vadorovsky commented Sep 16, 2025 •

edited

Loading

codecov-commenter commented Oct 3, 2025 •

edited

Loading

vadorovsky commented Oct 6, 2025 •

edited

Loading

HaoranYi commented Oct 7, 2025 •

edited

Loading

vadorovsky Nov 5, 2025 •

edited

Loading

jstarry Nov 5, 2025 •

edited

Loading

vadorovsky Nov 6, 2025 •

edited

Loading

jstarry Nov 5, 2025 •

edited

Loading

jstarry left a comment •

edited

Loading