- 
                Notifications
    You must be signed in to change notification settings 
- Fork 60
Allow configuring control_plane_storage_buffer #8099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
8a3d538
              8320ec6
              647f280
              c432c93
              ccaf61a
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|  | @@ -275,6 +275,7 @@ pub struct MgdConfig { | |||||||||
| struct UnvalidatedTunables { | ||||||||||
| max_vpc_ipv4_subnet_prefix: u8, | ||||||||||
| load_timeout: Option<std::time::Duration>, | ||||||||||
| control_plane_storage_buffer_gb: u32, | ||||||||||
| } | ||||||||||
|  | ||||||||||
| /// Configuration for HTTP clients to external services. | ||||||||||
|  | @@ -303,6 +304,11 @@ pub struct Tunables { | |||||||||
| /// | ||||||||||
| /// If "None", nexus loops forever during initialization. | ||||||||||
| pub load_timeout: Option<std::time::Duration>, | ||||||||||
|  | ||||||||||
| /// The amount of disk space to reserve for non-Crucible / control plane | ||||||||||
| /// storage in gibibytes. This amount represents a buffer that the region | ||||||||||
|          | ||||||||||
| /// The amount of disk space to reserve for non-Crucible / control plane | |
| /// storage in gibibytes. This amount represents a buffer that the region | |
| /// The amount of space to reserve per-disk for non-Crucible storage (i.e., control plane | |
| /// storage) in gibibytes. This amount represents a buffer that the region | 
Trying to be really clear for people not already familiar with this.
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right: this is part of PackageConfig, which is documented as "configuration parameters known at compile-time"; as opposed to DeploymentConfig, which are runtime, deployment-specific tunables.  But isn't the point of doing it this way to make it runtime?
Edit: I think this is basically fine -- see my long comment below for more.
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub control_plane_storage_buffer_gb: u32, | |
| pub control_plane_storage_buffer_gib: u32, | 
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|  | @@ -86,6 +86,9 @@ default_request_body_max_bytes = 1048576 | |||||
| # IPv4 subnetwork. This size allows for ~60 hosts. | ||||||
| max_vpc_ipv4_subnet_prefix = 26 | ||||||
|  | ||||||
| # For development environments, choose a zero sized storage buffer | ||||||
|          | ||||||
| # For development environments, choose a zero sized storage buffer | |
| # In development environments, reserve no space for control plane storage. | 
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|  | @@ -72,6 +72,9 @@ url = "postgresql://root@[::1]:32221/omicron?sslmode=disable" | |||||
| # IPv4 subnetwork. This size allows for ~60 hosts. | ||||||
| max_vpc_ipv4_subnet_prefix = 26 | ||||||
|  | ||||||
| # For development environments, choose a zero sized storage buffer | ||||||
|          | ||||||
| # For development environments, choose a zero sized storage buffer | |
| # In development environments, reserve no space for control plane storage. | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -137,6 +137,7 @@ use nexus_db_model::DnsGroup; | |
| use nexus_db_queries::context::OpContext; | ||
| use nexus_db_queries::db::DataStore; | ||
| use nexus_types::deployment::PendingMgsUpdates; | ||
| use omicron_common::api::external::ByteCount; | ||
| use omicron_uuid_kinds::OmicronZoneUuid; | ||
| use oximeter::types::ProducerRegistry; | ||
| use std::collections::BTreeMap; | ||
|  | @@ -525,6 +526,7 @@ impl BackgroundTasksInitializer { | |
| inventory_watcher.clone(), | ||
| config.physical_disk_adoption.disable, | ||
| rack_id, | ||
| args.control_plane_storage_buffer, | ||
| ), | ||
| ), | ||
| opctx: opctx.child(BTreeMap::new()), | ||
|  | @@ -959,6 +961,10 @@ pub struct BackgroundTasksData { | |
| pub webhook_delivery_client: reqwest::Client, | ||
| /// Channel for configuring pending MGS updates | ||
| pub mgs_updates_tx: watch::Sender<PendingMgsUpdates>, | ||
| /// The amount of disk space to reserve for non-Crucible / control plane | ||
|          | ||
| /// storage. This amount represents a buffer that the region allocation query | ||
| /// will not use for each U2. | ||
| pub control_plane_storage_buffer: ByteCount, | ||
| } | ||
|  | ||
| /// Starts the three DNS-propagation-related background tasks for either | ||
|  | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|  | @@ -24,6 +24,13 @@ if_exists = "append" | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # [schema] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # schema_dir = "/var/nexus/schema/crdb" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [tunables] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| max_vpc_ipv4_subnet_prefix = 26 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Reserve space for non-Crucible storage | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # See oxidecomputer/omicron#7875 for the 250G determination. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| control_plane_storage_buffer_gb = 250 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|          | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// Configuration for the "rack setup service". | |
| /// | |
| /// The Rack Setup Service should be responsible for one-time setup actions, | |
| /// such as CockroachDB placement and initialization. Without operator | |
| /// intervention, however, these actions need a way to be automated in our | |
| /// deployment. | |
| #[derive(Clone, Deserialize, Serialize, PartialEq, JsonSchema)] | |
| #[serde(try_from = "UnvalidatedRackInitializeRequest")] | |
| pub struct RackInitializeRequest { | |
| /// The set of peer_ids required to initialize trust quorum | |
| /// | |
| /// The value is `None` if we are not using trust quorum | |
| pub trust_quorum_peers: Option<Vec<Baseboard>>, | |
| /// Describes how bootstrap addresses should be collected during RSS. | |
| pub bootstrap_discovery: BootstrapAddressDiscovery, | |
| /// The external NTP server addresses. | |
| pub ntp_servers: Vec<String>, | |
| /// The external DNS server addresses. | |
| pub dns_servers: Vec<IpAddr>, | |
| /// Ranges of the service IP pool which may be used for internal services. | |
| // TODO(https://github.com/oxidecomputer/omicron/issues/1530): Eventually, | |
| // we want to configure multiple pools. | |
| pub internal_services_ip_pool_ranges: Vec<IpRange>, | |
| /// Service IP addresses on which we run external DNS servers. | |
| /// | |
| /// Each address must be present in `internal_services_ip_pool_ranges`. | |
| pub external_dns_ips: Vec<IpAddr>, | |
| /// DNS name for the DNS zone delegated to the rack for external DNS | |
| pub external_dns_zone_name: String, | |
| /// initial TLS certificates for the external API | |
| pub external_certificates: Vec<Certificate>, | |
| /// Configuration of the Recovery Silo (the initial Silo) | |
| pub recovery_silo: RecoverySiloConfig, | |
| /// Initial rack network configuration | |
| pub rack_network_config: RackNetworkConfig, | |
| /// IPs or subnets allowed to make requests to user-facing services | |
| #[serde(default = "default_allowed_source_ips")] | |
| pub allowed_source_ips: AllowedSourceIps, | |
| } | 
Wicket provides it with a hardcoded value of 250 here, where it constructs that struct:
omicron/wicketd/src/rss_config.rs
Lines 296 to 317 in 6e3fe96
| let request = RackInitializeRequest { | |
| trust_quorum_peers, | |
| bootstrap_discovery: BootstrapAddressDiscovery::OnlyThese( | |
| bootstrap_ips, | |
| ), | |
| ntp_servers: self.ntp_servers.clone(), | |
| dns_servers: self.dns_servers.clone(), | |
| internal_services_ip_pool_ranges, | |
| external_dns_ips: self.external_dns_ips.clone(), | |
| external_dns_zone_name: self.external_dns_zone_name.clone(), | |
| external_certificates: self.external_certificates.clone(), | |
| recovery_silo: RecoverySiloConfig { | |
| silo_name: Name::try_from(RECOVERY_SILO_NAME).unwrap(), | |
| user_name: UserId::try_from(RECOVERY_SILO_USERNAME).unwrap(), | |
| user_password_hash, | |
| }, | |
| rack_network_config, | |
| allowed_source_ips: self | |
| .allowed_source_ips | |
| .clone() | |
| .unwrap_or(AllowedSourceIps::Any), | |
| }; | 
Sled agent would write it into the "deployment" section of the Nexus config file, here:
omicron/sled-agent/src/services.rs
Lines 2665 to 2699 in 6e3fe96
| let deployment_config = DeploymentConfig { | |
| id: *id, | |
| rack_id: info.rack_id, | |
| techport_external_server_port: NEXUS_TECHPORT_EXTERNAL_PORT, | |
| dropshot_external: ConfigDropshotWithTls { | |
| tls: *external_tls, | |
| dropshot: dropshot::ConfigDropshot { | |
| bind_address: SocketAddr::new(*opte_ip, nexus_port), | |
| default_request_body_max_bytes: 1048576, | |
| default_handler_task_mode: | |
| HandlerTaskMode::Detached, | |
| log_headers: vec![], | |
| }, | |
| }, | |
| dropshot_internal: dropshot::ConfigDropshot { | |
| bind_address: (*internal_address).into(), | |
| default_request_body_max_bytes: 1048576, | |
| default_handler_task_mode: HandlerTaskMode::Detached, | |
| log_headers: vec![], | |
| }, | |
| internal_dns: nexus_config::InternalDns::FromSubnet { | |
| subnet: Ipv6Subnet::<RACK_PREFIX>::new( | |
| info.underlay_address, | |
| ), | |
| }, | |
| database: nexus_config::Database::FromDns, | |
| external_dns_servers: external_dns_servers.clone(), | |
| // TCP connections bound by HTTP clients of external services | |
| // should always be bound on our OPTE interface. | |
| external_http_clients: | |
| nexus_config::ExternalHttpClientConfig { | |
| interface: Some(opte_iface_name.to_string()), | |
| }, | |
| }; | 
For non-Wicket deployments (which includes helios-deploy, the "how-to-run" flow, and a4x2), these use the RSS config files in smf/sled-agent, and those would all specify 0 (or some other suitable number).
I thought this was similar to how some other config works (e.g., whether Nexus is configured for TLS), but it's slightly different:
- re: Nexus TLS: Sled Agent does write it into the config here, but it's based on the Nexus zone config, which itself is presumably determined by RSS based on the presence of certificates
- the name of the recovery silo and its user follow a similar flow except they get provided to Nexus with the RSS-to-Nexus handoff instead of being written into the Nexus config file
- the rack id is written into the Nexus config by Sled Agent but it's based on how the Sled was initialized, not the rack initialization request.
If we want this to come in as an RSS tunable, I think there are basically three flows:
- Put it into the sled config so every sled agent has it. Use that when Sled Agent writes Nexus's config file. Easy, works, but kind of gross because this isn't really sled config.
- Put it into the Nexus zone config and have RSS fill it in there, then have sled agent use that similar to how it fills in external_dns_servers,external_tls, etc. This is fine but a fair bit of work and also makes Sled Agent uses implicit interfaces with components it provisions #3407 slightly worse.
- Put it into the RSS-to-Nexus handoff. This is fine but means that it's actually dynamic in Nexus -- it'd be stored in the database and presumably re-read periodically or something. This too is a lot more work (though less than (2) and avoids making Sled Agent uses implicit interfaces with components it provisions #3407 any worse).
The main advantage to one of these approaches is that it's truly a deployment-time config. What's in this PR is still compile-time. It's just being determined late enough that we know which deployment we're targeting. This works okay right now because we already need different sets of packages for these deployments. I think we want to get away from needing different packages. But that will already be a bunch of work and this PR doesn't make it much harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just finished updating and testing this branch with a4x2, and you were spot on with your suspicion below: the 250 GiB storage buffer is applied in the a4x2 case.
My reasoning here was that the value(s) were known at "compile-time": it's either 0 or 250 GiB. But this doesn't work for a4x2, which uses the multi-sled nexus config-partial. This distinction I was missing was that multi-sled does not equal prod, and single sled does not equal dev.
I'm going to change this PR to use the method recommended in this comment, which will require a follow up testbed PR to change the config-rss.toml over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davepacheco I started implementing 2, but ended up implementing 3 - I think it works better to store the value in the database.
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bigger question I have is: does a4x2 not use this file? or: have you tested that this does the right thing on a4x2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀