Skip to content

propolis-server: want a better accounting of memory #890

@iximeow

Description

@iximeow

memory accounting for a process with a VMM mapped, as measured by pmap -x (via /proc/self/xmap) does not give the most useful measurements. guest memory is mapped twice into virtual memory, so virtual address space looks much larger than can ever be used. as a guest touches its available physical memory, those pages become tracked in propolis' HAT and subsequent resident page reporting, even when backed by the reservoir. propolis accumulates many thousands of mappings, so any accounting holds the address space lock (exclusively! illumos issue) and can be very impactful to the accounted process.

my motivation is to know, specifically, how many bytes of non-shared mappings back the non-VMM parts of a propolis-server address space. this is the memory that illumos will have charged to the Propolis process, but a second question is how much of that is actually used? this difference tells us if things like Propolis' heap is larger than necessary. so as an example question that would be nice to answer: of a "584 MiB" resident set propolis-server, how much of that is a guest using part of its 1 GiB physical memory, and how much of that is the result of code in our control, in this repository?

this is particularly relevant for the recent VMM reservoir size tuning work; a worst case for memory overhead on a system is the maximum number of minimum-size VMs. if we should expect 100 MiB for Propolis with a collection of 1 GiB VMs, then we should leave at least ~10% of memory out of the reservoir. right now it's kinda annoying to measure this and make an educated decision.

my memory-stats branch is a draft of "better" and demonstrates operational challenges. walking the process's mappings for accounting purposes means we're holding the process' address space lock. this involves a write lock rather than read (unfortunate illumos issue), but even under a read lock this poses a risk of stalling guest operations purely for relatively-infrequent-value metrics.

  • we could reduce the amount of time spent collecting non-shared mappings with Patrick's idea of wiring up an MC_INVALIDATE command to memcntl(2)
    • the thousands of mappings for guest memory pages we accumulate in Propolis does us no favors, this would speed up accounting of a propolis address space a lot. even then, that makes this suitable really only for testing or dogfood, not live metrics in real deployments.
  • maybe my draft/PoC memory accounting is still worth landing without automatically reporting as Oximeter stats, instead manually queried as part of testing/dogfood..?
  • seems like ideally illumos could count these high-level metrics of a process' memory and report it somehow. that makes measurement here simple(r) and more attainable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions