Skip to content

Conversation

@harshavamsi
Copy link
Contributor

@harshavamsi harshavamsi commented Mar 27, 2025

Description

Most of the description is in #17702, this PR adds checks before we can create empty buckets.

Before we create empty buckets, we check how many potential buckets would be created and add those to the CircuitBreaker which could either trip or cause max_buckets_exception.

Related Issues

Resolves #17702

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

❌ Gradle check result for 60c7b21: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harshavamsi harshavamsi changed the title Initial commit to address reduce empty buckets bug Fix addEmptyBuckets from creating too many buckets when given big extended bounds Mar 27, 2025
@github-actions github-actions bot added the bug Something isn't working label Mar 27, 2025
@harshavamsi harshavamsi marked this pull request as ready for review March 27, 2025 23:06
@jainankitk
Copy link
Contributor

@harshavamsi - Are you still working on this PR?

);
}

public static ReduceContext forFinalReduction(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to get rid of this constructor and force the consumers to provide CircuitBreaker as a safety mechanism?
(if this is a breaking change - then we can atleast mark this for deprecation)

@kkhatua kkhatua added the v3.4.0 Issues and PRs related to version 3.4.0 label Nov 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

❌ Gradle check result for eaadff0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harshavamsi harshavamsi closed this Nov 5, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap Nov 5, 2025
@harshavamsi harshavamsi reopened this Nov 5, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap Nov 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

❌ Gradle check result for eaadff0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

❌ Gradle check result for 88c6733: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

❌ Gradle check result for 0dc14fa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

❌ Gradle check result for 666d610: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harshavamsi harshavamsi closed this Nov 6, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap Nov 6, 2025
@harshavamsi harshavamsi reopened this Nov 6, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap Nov 6, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

✅ Gradle check result for 666d610: SUCCESS

@harshavamsi
Copy link
Contributor Author

@jainankitk @bowenlan-amzn could you take a look again, I've addressed the concerns from before

@bowenlan-amzn
Copy link
Member

@harshavamsi could add REST tests based on this #17718 (comment)
Without the extended bounds in the query.

Comment on lines +404 to +406
if (bounds != null && bounds.getMin() != null && bounds.getMax() != null && !list.isEmpty()) {
long min = min(bounds.getMin() + offset, list.getFirst().key);
long max = max(bounds.getMax() + offset, list.getLast().key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see we are now checking list.getFirst and getLast key values. However, it seems to me the if condition bounds != null still limit the applicable scenario to when user provide extended bound

Comment on lines +503 to +505
if (postAddEmptyBucketCount > 0) {
reduceContext.consumeBucketsAndMaybeBreak(postAddEmptyBucketCount);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we can account for negative value here.

public void accept(int value) {
if (value != 0) {
count += value;
if (count > limit) {
throw new TooManyBucketsException(
"Trying to create too many buckets. Must be less than or equal to: ["
+ limit
+ "] but was ["
+ count
+ "]. This limit can be set by changing the ["
+ MAX_BUCKET_SETTING.getKey()
+ "] cluster level setting.",
limit
);
}
}
callCount.increment();
// tripping the circuit breaker for other threads in case of concurrent search
// if the circuit breaker has tripped for one of the threads already, more info
// can be found on: https://github.com/opensearch-project/OpenSearch/issues/7785
if (circuitBreakerTripped) {
throw new CircuitBreakingException(
"Circuit breaker for this consumer has already been tripped by previous invocations. "
+ "This can happen in case of concurrent segment search when multiple threads are "
+ "executing the request and one of the thread has already tripped the circuit breaker",
breaker.getDurability()
);
}
// check parent circuit breaker every 1024 to (1024 + available processors) calls
long sum = callCount.sum();
if ((sum >= 1024) && (sum & 0x3FF) <= availProcessors) {
try {
breaker.addEstimateBytesAndMaybeBreak(0, "allocated_buckets");
} catch (CircuitBreakingException e) {
circuitBreakerTripped = true;
throw e;
}
}
}

Comment on lines +134 to +137
multiBucketConsumer,
requireNonNull(pipelineTreeRoot, "prefer EMPTY to null"),
() -> pipelineTreeRoot,
breaker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the multiBucketConsumer of reduce context already has breaker in it. Should we just use that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.3 Backport to 1.3 branch backport 2.x Backport to 2.x branch backport 3.0 bug Something isn't working stalled Issues that have stalled v3.1.0 v3.4.0 Issues and PRs related to version 3.4.0

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[BUG] Histogram aggregations can produce billions of empty buckets consuming lots of memory causing OOM issues

6 participants