Skip to content

further tune sled-agent memory budgeting #7911

@iximeow

Description

@iximeow

the Omicron side of my RFD 413 refresh and an extension of #7448.

#7448 was resolved with an improvement to the VMM reservoir calculation but there are further steps for improvement here. in this thread Greg and I discussed the tradeoffs of being more precise with budget terms today. we settled on not being as prescriptive as we could, today, for a few reasons:

  • we'd want to start the control_plane_memory_earmark_mb setting much higher than it is: in the range of 170 GiB for current systems
  • we'd want to set vmm_reservoir_percentage much higher (90+%)
  • we'd really want to be able to tune the control plane earmark down once sled-agent knows which control plane services are not on its sled
  • we really want to better understand long-uptime and peak usage behaviors of all services

the first two points here conflict with our relatively poor characterization of the last point.

  • we can do better about measuring propolis (propolis-server: want a better accounting of memory propolis#890) and crucible (would be nice to report memory use through Oximeter crucible#1692) over time. this will give us more confidence about stability with a higher vmm_reservoir_percentage
  • tune control plane earmark down from an initial conservative guess. this is only half thought-through; if we size the reservoir too aggressively and need to shrink it to move a new CRDB instance onto a sled, there's clearly some overhead there. over long uptimes, presumably this would contribute to fragmenting the VMM reservoir too.
  • we should collect memory use of other control plane services to inform these calculations, but as with the Propolis and Crucible issues, collecting address space metrics currently induces stalls in the measured processes, so we wouldn't want to do this in an unattended regular manner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Sled AgentRelated to the Per-Sled Configuration and Management

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions