Skip to content

Conversation

@2501babe
Copy link
Member

@2501babe 2501babe commented Jul 28, 2025

Problem

the bpf stake program is now active on all clusters and can be removed from builtins

Summary of Changes

remove it from BUILTINS, add the bpf stake program 1.0.0 binary blob to CORE_BPF_PROGRAMS, and fix various tests and test setups to deal with this:

  • solana-test-validator and solana-program-test now ship with the core bpf stake program as a drop-in replacement
  • runtime, ledger, and rpc tests are changed in ad hoc ways to no longer expect the stake program to exist; in general they used it for convenience and were not testing actual stake program functionality
  • one partitioned epoch rewards test which tested stake program non-operation during the rewards period is superceded by a new, comprehensive test in the stake program repo
  • the solana-stake-accounts cli adds the stake program to its bank for its tests, which are not separable into the stake repo
  • delete programs/stake-tests/. these were tests for MoveStake and MoveLamports which have long since been duplicated in the bpf stake repo

this pr is already quite difficult to review. if everyone is happy with this idea, id like to merge this as a standalone pr without removing the stake program code, and do that in a followup pr. removing the stake program code is nontrivial because it contains a bunch of utility functions which are used in tests, during genesis, and possibly elsewhere. it also contains some of the rewards-related code still running in agave. and we may want to reexport various things to not break downstream too badly

these changes will have to be debated (particularly breaking the interface, whether we move things into new packages, etc) and itd be better to look at them in isolation

@2501babe 2501babe self-assigned this Jul 28, 2025
@2501babe 2501babe force-pushed the 20250728_stakebpf branch 3 times, most recently from a6cde57 to f502627 Compare July 29, 2025 13:00
@2501babe
Copy link
Member Author

2501babe commented Jul 29, 2025

playing whackamole with what i am supposed to change now that the stake program is migrated...

  • builtins-default-costs/src/lib.rs:
    • decrement TOTAL_COUNT_BUILTINS
    • delete stake program from MIGRATING_BUILTINS_COSTS
  • builtins/src/lib.rs:
    • delete prototype from BUILTINS
    • delete pubkey from buffer_accounts
  • program-test/src/programs.rs: add binary blob to CORE_BPF_PROGRAMS
  • runtime/src/genesis_utils.rs: decrement NUM_BUILTIN_PROGRAMS
  • test-validator/src/lib.rs: dont deactivate migrate_stake_program_to_core_bpf

plus various test changes already committed...

still todo:

  • cli/tests/stake.rs: delegate fails because we need some slack in compute usage, it uses more than simulate because votes get added to vote state
  • runtime/tests/stake.rs: bank doesnt have stake program, add it or move these tests to the stake repo... it uses some bank functionality explicitly, not sure how much it matters
  • ledger/src/staking_utils.rs: just creates and delegates a stake account, we can mock this
  • runtime/src/bank/partitioned_epoch_rewards/mod.rs: test_program_execution_restricted_for_stake_account_in_reward_period explicitly tests the stake program blocks operation in rewards period, i dont think we can do away with it
  • LocalCluster: there was a test if stake is available at genesis (its not) which i removed but unclear if there are tests that actually use stake here, local cluster tests just fail a ton on this branch

more todo:

  • rpc_pubsub::tests::test_account_subscribe' panicked at rpc/src/rpc_pubsub.rs:934
  • bank::tests::test_rent_state_changes_sysvars' panicked at runtime/src/bank/tests.rs:9794
  • bank::tests::test_shrink_candidate_slots_cached' panicked at runtime/src/bank/tests.rs:5408
  • bank::tests::test_bank_cloned_stake_delegations' panicked at runtime/src/bank/tests.rs:3151
  • bank::tests::test_bank_hash_consistency' panicked at runtime/src/bank/tests.rs:5174
  • bank::tests::test_vote_epoch_panic' panicked at runtime/src/bank/tests.rs:8207
  • stake-accounts/src/stake_accounts.rs

still no clear reason for the local cluster failures, the logs all just say timeout

the most perplexing thing to fix in this pr was that no LocalCluster tests failed, they just all timed out. this was because they have a setup function to create a stake account via the stake program (which did not exist), but they "succeed" as long as the transaction is processed, then infinitely loop waiting to confirm the account has a nonzero balance

note we had test failures in cli/tests/stake.rs from the fact that compute cost for DelegateStake has some jitter, but we did not need to edit this file at all in this pr because these pass after #7447

@mergify
Copy link

mergify bot commented Aug 7, 2025

If this PR represents a change to the public RPC API:

  1. Make sure it includes a complementary update to rpc-client/ (example)
  2. Open a follow-up PR to update the JavaScript client @solana/kit (example)

Thank you for keeping the RPC clients in sync with the server API @2501babe.

@2501babe 2501babe force-pushed the 20250728_stakebpf branch 3 times, most recently from 16e51ec to d09ad5a Compare August 11, 2025 10:45
@2501babe 2501babe force-pushed the 20250728_stakebpf branch 3 times, most recently from 9504c99 to 246f038 Compare August 12, 2025 13:46
@2501babe 2501babe changed the title program-test: add bpf stake program 1.0.0 remove stake program builtin Aug 12, 2025
Comment on lines 515 to 516
#[test]
fn test_builtin_program_migration() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is kind of a shame to delete this whole test but i dont see how to realistically keep is since we no longer have any migrating programs. maybe make a mental note of the fact that we removed it, so we can add it back when we have a new migration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's better to just refactor the BuiltinProgramsFilter a little bit so that, in a test environment, it can be mocked out, and we can keep the tests here and here around to make sure the scaffold works.

Might be a tall order though, considering the compile-time assertions involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this? we keep the important test and while it doesnt actually test anything it gives some automatic safety next time a migration is added back #7495

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks good but would prefer it to be part of this PR so that we can read the diff easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +64 to +70
(
solana_sdk_ids::stake::ID,
None,
include_bytes!("programs/core_bpf_stake-1.0.0.so"),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw im very pleased to report that all the pain of this pr was in dealing with non-ProgramTest setups that are suddenly missing the stake program. zero issues with any of the ProgramTest consumers that now have a different stake program

Comment on lines -878 to +877
let new_stake_authority = solana_pubkey::new_rand();
let stake_authority = Keypair::new();
let validator = Keypair::new();
let voter = Keypair::new();
Copy link
Member Author

@2501babe 2501babe Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test has a complicated diff but the tldr is it created a stake account and wanted to see pubsub updates when it was reauthorized, i just reauthorize the vote account instead for the same effect

Comment on lines +860 to +862
/// Test that lamports can be sent to stake accounts regardless of rewards period.
#[test]
fn test_program_execution_restricted_for_stake_account_in_reward_period() {
use solana_transaction_error::TransactionError::InstructionError;

fn test_rewards_period_system_transfer() {
Copy link
Member Author

@2501babe 2501babe Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic stripped out here is replaced by solana-program/stake#78, which tests all stake program instructions (except deactivate deliquent rip) instead of just withdraw

Comment on lines 3141 to 3158
// insert a stake account delegated to it, since we do not have a stake program
let stake_pubkey = Pubkey::new_unique();
let stake_account = stake_state::create_account_with_activation_epoch(
&stake_pubkey,
&vote_keypair.pubkey(),
&bank.get_account(&vote_keypair.pubkey()).unwrap(),
&Rent::default(),
stake_balance,
bank.epoch(),
);

bank.store_account(&stake_pubkey, &stake_account);

// the stakes cache is updated by any successful transaction that touches a stake account
let transaction =
system_transaction::transfer(&mint_keypair, &stake_pubkey, 1, bank.last_blockhash());

bank.process_transaction(&transaction).unwrap();
Copy link
Member Author

@2501babe 2501babe Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

highlighting this in case it is contentious. this is the only runtime test that uses a stake program interaction for something "real." but i believe my change is just as good in accomplishing the intent of the test

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't test the right thing anymore. The stake account is first added to the stakes cache by bank.store_account(&stake_pubkey, &stake_account) above. After the transfer you're not checking that the stake account balance in the stakes cache was increased by 1 lamport.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

youre right, i wrongly believed store_account() was a backdoor for tests that just adds it wth no side effects, knowing that ProgramTestContext function set_account() is a backdoor for tests

Comment on lines 5181 to 5190
assert_eq!(bank.epoch(), 0);
assert_eq!(
bank.hash().to_string(),
"AyXhbqmPsC46x7MHAuW89pQcNZVrUZnAND6ABWJ24svx",
"EzyLJJki4ALhQAq5wbmiNctDhytQckGJRXnk9APKXv7r",
);
}
Copy link
Member Author

@2501babe 2501babe Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i joked i feel like claude code just replacing the failing static value with the actual one, but per joe this is correct. the bank hash is different because we no longer have the builtin stake program account. but there is no consensus divergence because this is already the case on live clusters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lgtm. For context: #5331 (comment)

Comment on lines 302 to 311
let core_programs = core_bpf_programs(&genesis_config.rent, |_| true);
let stake_idx = core_programs
.iter()
.position(|(key, _)| *key == stake::program::id())
.unwrap();
let program = core_programs[stake_idx].1.clone();
let (programdata_address, programdata) = core_programs[stake_idx + 1].clone();

genesis_config.add_account(stake::program::id(), program);
genesis_config.add_account(programdata_address, programdata);
Copy link
Member Author

@2501babe 2501babe Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as what i originally did in LocalCluster (but changed it to load all core programs). but unlike there, just throwing the program in here is totally unobjectionable, because its just testing a stake cli

fwiw these are the only two places i needed the core program binary blobs outside ProgramTest, so i didnt consider it important to break out the core bpf programs into a new package. in fact i think its better they live there: circular dependencies create a barrier to dragging these magic elves (lol) all over the code base where they dont belong

Comment on lines +255 to +262
for core_program_account in &core_bpf_programs(&Rent::default(), |_| true) {
config
.additional_accounts
.push(core_program_account.clone());
}

Copy link
Member Author

@2501babe 2501babe Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im mildly concerned this is a hack, i dont know what mechanism a real cluster genesis uses to make sure it gets the core bpf programs, and if it is viable for LocalCluster to use it as well. i guess my view is "if we have some kind of test coverage of the multinode setup or similar, that ensures fresh clusters actually work, what we do in LocalCluster doesnt really matter. if LocalCluster is the only test surface we have over setting up clusters, probably we should use the formal mechanism that real clusters use to get core bpf programs"

also i assume that because it goes through the same kind of GenesisConfig setup as bank tests, LocalCluster is implicitly FeatureSet::all_enabled(), but i didnt find where this actually happens

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add the Stake BPF program to the genesis setup, right? I'm assuming local cluster goes through this whole process.

https://github.com/2501babe/solana/blob/20250728_stakebpf/fetch-core-bpf.sh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok! this is the mechanism for cluster genesis that i was not aware of 😅 also ive updated the binary name in the stake program release as i didnt know there was a convention for it

LocalCluster definitely does not call any shell scripts or pull anything down from github, no. i believe in master it just doesnt have any of the core bpf programs, but until this pr, it never uses any of them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable way to do it though! Unless anyone has any opposition to the program-test dep here, wfm.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed! program-test is for testing programs.. local-cluster is for setting up a cluster with staked nodes.. there should definitely be a way to setup a test cluster without program-test in the agave repo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is splitting the core program binaries into a separate package sufficient or do you think it should clone from github like a real cluster?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what you mean. Could we add a dependency on solana-stake-program and use that to build the sbf binary? What are we currently doing for program test?

@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 59.48718% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.4%. Comparing base (04ea3ee) to head (855e90f).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #7203     +/-   ##
=========================================
- Coverage    83.4%    83.4%   -0.1%     
=========================================
  Files         814      814             
  Lines      366038   366031      -7     
=========================================
- Hits       305516   305443     -73     
- Misses      60522    60588     +66     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@2501babe 2501babe marked this pull request as ready for review August 12, 2025 17:15
@2501babe 2501babe requested a review from a team as a code owner August 12, 2025 17:15
@2501babe 2501babe requested a review from buffalojoec August 12, 2025 17:15
@2501babe 2501babe force-pushed the 20250728_stakebpf branch 4 times, most recently from 8d5ed9e to 4e60638 Compare August 20, 2025 09:51
Copy link

@buffalojoec buffalojoec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just needs a rebase and I'll stamp from my side.

@2501babe 2501babe force-pushed the 20250728_stakebpf branch 2 times, most recently from 3c75de9 to d36f10e Compare August 21, 2025 07:05
buffalojoec
buffalojoec previously approved these changes Aug 21, 2025
@buffalojoec
Copy link

@yihau can you accept solana-program-binaries?

@yihau
Copy link
Member

yihau commented Aug 21, 2025

green! 🫡

@2501babe 2501babe added the automerge automerge Merge this Pull Request automatically once CI passes label Aug 21, 2025
@2501babe
Copy link
Member Author

@anza-xyz/fees i think we need another fees approval for this to land

* remove stake program from `BUILTINS` and `MIGRATING_BUILTINS_COSTS`
* add bpf state program to `CORE_BPF_PROGRAMS`
* use bpf program in solana-test-validator, `ProgramTest`, and `LocalCluster`
* add stake program repo to core bpf genesis script
* break core bpf and spl binaries out into new solana-program-binaries crate
* fix bank-based tests that no longer have a stake program
@anza-team
Copy link
Collaborator

😱 New commits were pushed while the automerge label was present.

@anza-team anza-team removed the automerge automerge Merge this Pull Request automatically once CI passes label Aug 21, 2025
@apfitzge apfitzge self-requested a review August 21, 2025 15:33
Copy link

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review for fee-related changes: lgtm

@2501babe 2501babe added the automerge automerge Merge this Pull Request automatically once CI passes label Aug 21, 2025
@mergify
Copy link

mergify bot commented Aug 21, 2025

automerge label removed due to a CI failure

@mergify mergify bot removed the automerge automerge Merge this Pull Request automatically once CI passes label Aug 21, 2025
@2501babe 2501babe added the automerge automerge Merge this Pull Request automatically once CI passes label Aug 21, 2025
@2501babe 2501babe merged commit eec3878 into anza-xyz:master Aug 21, 2025
61 of 62 checks passed
@2501babe 2501babe deleted the 20250728_stakebpf branch August 21, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge automerge Merge this Pull Request automatically once CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants