Skip to content

[BUG] Histogram aggregations can produce billions of empty buckets consuming lots of memory causing OOM issues #17702

@harshavamsi

Description

@harshavamsi

Describe the bug

Today in OpenSearch, we can run expensive queries that create lots of objects in memory. To combat this, we use the CircuitBreaker which trips when memory pressure on the cluster piles up. CircuitBreaker is attached to many services, including SearchService which allows for expensive Objects and arrays to report their memory usage and trip the CircuitBreaker when they grow too large. Most aggregation queries make use of BigArrays to track ordinal counts and increment values in the BigArrays . When new buckets are created, BigArrays grows in size and reports its memory usage to the CircuitBreaker which may trip if required. We also have another mechanism of tracking lots of buckets with the search.max_buckets setting which defaults to a max of 65,536. This prevents more than those number of buckets from being created. Since each bucket has an assumed cost of 5Kb, buckets can be expensive to create.

A bug that was reported to us was while computing certain Histogram aggregations. During the reduce phase while computing aggregations, the coordinator does a top level reduce on the top level aggregation in the query which subsequently call their sub aggregation reduce recursively. Each aggregation type has an Internal representation which represents the agg type while being reduced. When InternalDateHistogram's reduce in called, it calls its reduceBuckets method which calls reduceBucket which calls a reduce on each bucket's aggregations. We call these non-empty buckets because they contain actual counts for values that the aggregation produces. InternalDateHistogram's reduce then calls addEmptyBuckets when minDocCount==0, this is the default case when no minDocCount is specified. Here is where the bug is, when we try to add empty buckets, we use the extendedBounds that are specified as part of the aggregation. The extended bounds can be as large as possible, for example,

"extended_bounds": {
          "min": 0,
          "max": 1741558953724
        }

is an allowed extended bound that created billions of empty buckets. These buckets are added in a loop to an iterator. This can cause massive memory consumption on the cluster as there are billions and billions of empty buckets being created and memory is not being reported to CircuitBreaker or to search.max_buckets. Here is an example of a histogram query with a heap dump causing showing massive number of InternalDateHistogram bucket objects being created

Initiating dump at 2025-03-12 11:43:32.187402
6906:
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:     957930017    53644080952  org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram$Bucket
   2:        408193     9659489864  [Ljava.lang.Object; ([email protected])
   3:       3789839     1383884968  [B ([email protected])
   4:       1115012     1106714688  [I ([email protected])
   5:       2352988      112943424  java.util.HashMap$Node ([email protected])
   6:        875554       70044320  java.nio.DirectByteBufferR ([email protected])
   7:       1935380       61932160  java.lang.String ([email protected])
   8:         46908       52193552  [J ([email protected])
"opensearch[node_id][search][T#1]" #287 daemon prio=5 os_prio=0 cpu=23337769.03ms elapsed=2814914.89s tid=0x0000ffee541559e0 nid=0x24cb runnable  [0x0000ffb9390bb000]
   java.lang.Thread.State: RUNNABLE
    at org.opensearch.common.Rounding$PreparedRounding.maybeUseArray(Rounding.java:425)
    at org.opensearch.common.Rounding$TimeUnitRounding.prepare(Rounding.java:518)
    at org.opensearch.common.Rounding.nextRoundingValue(Rounding.java:320)
    at org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram.nextKey(InternalDateHistogram.java:505)
    at org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram.addEmptyBuckets(InternalDateHistogram.java:412)

The loop never terminates and causes the node processing the request to go out of memory before taking the cluster down.

Proposed Solution

  • After non-empty buckets are created, we trigger a call to reduceContext.consumeBucketsAndMaybeBreak(reducedBuckets.size()); to check if we can even create empty buckets
  • Before we can add empty buckets, we compute how many empty buckets we would need to add. We check the value at an interval by sampling a few times to get an approximate bucket count since the counts depend on the next key for each bucket we find
  • we add the empty bucket count to the breaker to see if it trips, else we report it to search.max_buckets to see if that trips
  • if we pass all these checks, we are allowed to create empty buckets and process the query normally

Related component

No response

To Reproduce

Sample query to reproduce the issue

GET _search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "log_time": {
              "from": 0,
              "to": 1741558953724,
              "include_lower": true,
              "include_upper": true,
              "format": "epoch_millis",
              "boost": 1.0
            }
          }
        },
        {
          "query_string": {
            "query": "app_name:nextgen_rules_engine   AND verb:(\"execute\")   AND type:api  AND tid:*  AND NOT msg: rules_engine_dynamic_configuration",
            "fields": [],
            "type": "best_fields",
            "default_operator": "or",
            "max_determinized_states": 10000,
            "enable_position_increments": true,
            "fuzziness": "AUTO",
            "fuzzy_prefix_length": 0,
            "fuzzy_max_expansions": 50,
            "phrase_slop": 0,
            "analyze_wildcard": true,
            "escape": false,
            "auto_generate_synonyms_phrase_query": true,
            "fuzzy_transpositions": true,
            "boost": 1.0
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "aggregations": {
    "2": {
      "date_histogram": {
        "field": "log_time",
        "format": "epoch_millis",
        "interval": "10s",
        "offset": 0,
        "order": {
          "_key": "asc"
        },
        "keyed": false,
        "extended_bounds": {
          "min": 0,
          "max": 1741558953724
        }
      },
      "aggregations": {
        "1": {
          "percentiles": {
            "field": "time_taken_ms",
            "percents": [
              95.0
            ],
            "keyed": true,
            "tdigest": {
              "compression": 100.0
            }
          }
        }
      }
    }
  }
}

Expected behavior

Queries should terminate with exception instead of running for long hours taking down a node

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions