-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
Today in OpenSearch, we can run expensive queries that create lots of objects in memory. To combat this, we use the CircuitBreaker which trips when memory pressure on the cluster piles up. CircuitBreaker is attached to many services, including SearchService which allows for expensive Objects and arrays to report their memory usage and trip the CircuitBreaker when they grow too large. Most aggregation queries make use of BigArrays to track ordinal counts and increment values in the BigArrays . When new buckets are created, BigArrays grows in size and reports its memory usage to the CircuitBreaker which may trip if required. We also have another mechanism of tracking lots of buckets with the search.max_buckets setting which defaults to a max of 65,536. This prevents more than those number of buckets from being created. Since each bucket has an assumed cost of 5Kb, buckets can be expensive to create.
A bug that was reported to us was while computing certain Histogram aggregations. During the reduce phase while computing aggregations, the coordinator does a top level reduce on the top level aggregation in the query which subsequently call their sub aggregation reduce recursively. Each aggregation type has an Internal representation which represents the agg type while being reduced. When InternalDateHistogram's reduce in called, it calls its reduceBuckets method which calls reduceBucket which calls a reduce on each bucket's aggregations. We call these non-empty buckets because they contain actual counts for values that the aggregation produces. InternalDateHistogram's reduce then calls addEmptyBuckets when minDocCount==0, this is the default case when no minDocCount is specified. Here is where the bug is, when we try to add empty buckets, we use the extendedBounds that are specified as part of the aggregation. The extended bounds can be as large as possible, for example,
"extended_bounds": {
"min": 0,
"max": 1741558953724
}
is an allowed extended bound that created billions of empty buckets. These buckets are added in a loop to an iterator. This can cause massive memory consumption on the cluster as there are billions and billions of empty buckets being created and memory is not being reported to CircuitBreaker or to search.max_buckets. Here is an example of a histogram query with a heap dump causing showing massive number of InternalDateHistogram bucket objects being created
Initiating dump at 2025-03-12 11:43:32.187402
6906:
num #instances #bytes class name (module)
-------------------------------------------------------
1: 957930017 53644080952 org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram$Bucket
2: 408193 9659489864 [Ljava.lang.Object; ([email protected])
3: 3789839 1383884968 [B ([email protected])
4: 1115012 1106714688 [I ([email protected])
5: 2352988 112943424 java.util.HashMap$Node ([email protected])
6: 875554 70044320 java.nio.DirectByteBufferR ([email protected])
7: 1935380 61932160 java.lang.String ([email protected])
8: 46908 52193552 [J ([email protected])
"opensearch[node_id][search][T#1]" #287 daemon prio=5 os_prio=0 cpu=23337769.03ms elapsed=2814914.89s tid=0x0000ffee541559e0 nid=0x24cb runnable [0x0000ffb9390bb000]
java.lang.Thread.State: RUNNABLE
at org.opensearch.common.Rounding$PreparedRounding.maybeUseArray(Rounding.java:425)
at org.opensearch.common.Rounding$TimeUnitRounding.prepare(Rounding.java:518)
at org.opensearch.common.Rounding.nextRoundingValue(Rounding.java:320)
at org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram.nextKey(InternalDateHistogram.java:505)
at org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram.addEmptyBuckets(InternalDateHistogram.java:412)
The loop never terminates and causes the node processing the request to go out of memory before taking the cluster down.
Proposed Solution
- After non-empty buckets are created, we trigger a call to
reduceContext.consumeBucketsAndMaybeBreak(reducedBuckets.size());to check if we can even create empty buckets - Before we can add empty buckets, we compute how many empty buckets we would need to add. We check the value at an interval by sampling a few times to get an approximate bucket count since the counts depend on the next key for each bucket we find
- we add the empty bucket count to the breaker to see if it trips, else we report it to
search.max_bucketsto see if that trips - if we pass all these checks, we are allowed to create empty buckets and process the query normally
Related component
No response
To Reproduce
Sample query to reproduce the issue
GET _search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"log_time": {
"from": 0,
"to": 1741558953724,
"include_lower": true,
"include_upper": true,
"format": "epoch_millis",
"boost": 1.0
}
}
},
{
"query_string": {
"query": "app_name:nextgen_rules_engine AND verb:(\"execute\") AND type:api AND tid:* AND NOT msg: rules_engine_dynamic_configuration",
"fields": [],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"analyze_wildcard": true,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"2": {
"date_histogram": {
"field": "log_time",
"format": "epoch_millis",
"interval": "10s",
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"extended_bounds": {
"min": 0,
"max": 1741558953724
}
},
"aggregations": {
"1": {
"percentiles": {
"field": "time_taken_ms",
"percents": [
95.0
],
"keyed": true,
"tdigest": {
"compression": 100.0
}
}
}
}
}
}
}
Expected behavior
Queries should terminate with exception instead of running for long hours taking down a node
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.