Skip to content

Potential memory leak on store-gateway metrics #4451

@roystchiang

Description

@roystchiang

Describe the bug
Store-gateway memory consumption and response time increases for /metrics the longer the pod stays running

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (master@090988c40f3eec21623713dd4403b3bbd46175c6)
  2. Run store-gateway in shuffle sharding mode
  3. Ingest data for multiple tenants (upwards of 4000)
  4. Call /metrics on store-gateway

Expected behavior

  • I expect the memory usage to be constant over time, and /metrics response time to stay the same

Environment:

  • Infrastructure: kubernetes
  • Deployment tool: helm

Storage Engine

  • Blocks
  • Chunks

Additional Context

What I think is happening:

Potential solution:
do you think it makes sense to keep a "global expired metric" everytime we sync the blocks?

we can periodically aggregate all the metrics that could've been dropped, instead of keeping all of them and recalculate the metrics everytime. I would be happy to produce a PR if this solution works for you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions