Skip to content

Conversation

@ajleong623
Copy link
Contributor

@ajleong623 ajleong623 commented Aug 20, 2025

Description

This change is a rebased and cleaned version of another pr #18601

Related Issues

Resolves #18377, resolves #18438

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anthony Leong <[email protected]>
@ajleong623 ajleong623 requested a review from a team as a code owner August 20, 2025 17:15
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc Search:Query Insights labels Aug 20, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for ab74705: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for ad9e107: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 9e3af03: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 7e9374a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for a345b21: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for ae2c661: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for c9ad00b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 32a2fae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
Copy link
Contributor

@kaushalmahi12 kaushalmahi12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some intial comments, will go through rest of it in couple of days

Comment on lines 545 to 547
} else {
return "system_failure";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move the numeric values into well defined constants/Enums then directly use, StatusCode.from(getStatusFamilyCode)

Comment on lines 105 to 107
String successType = RestStatus.ACCEPTED.getErrorType();
String userFailureType = RestStatus.BAD_REQUEST.getErrorType();
String systemErrorType = RestStatus.INTERNAL_SERVER_ERROR.getErrorType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the only RestStatus codes we need to track or all of them ? It would be better to encapsulate these into a collection to avoid tracking any new additions. That will help avoid any changes in this class.
It can easily be written more generically

for (RestStatus status: supportedStatusCodes) {
      builder.field(status.getErrorType(), errorTypeCounts.getOrDefault(status.getErrorType(), (long) 0));
}

Copy link
Contributor Author

@ajleong623 ajleong623 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea to group the possible RestStatus values was mentioned in #18438 (comment). I think it simplifies what the user views. I will add the new enums in a different file.

* @opensearch.api
*/
@PublicApi(since = "1.0.0")
public class SearchResponseStatusStats implements Writeable, ToXContentFragment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty much duplicate of the DocStatusStats class. Can we make a generic class which will help us remove this duplicity

}
}

indicesService.addDocStatusStats(stats);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are replacing the previous stats completely then why not just use the same method with updated logic.

Comment on lines 334 to 335
cancellationListener.onResponse(searchResponse);
indicesService.getSearchResponseStatusStats().inc(searchResponse.status());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear on whether if you should put these instructions in this order it will be correct and might be flaky since it might have already sent the response via the httpChannel which is responsible for this request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I am not 100% sure if the code will continue to be run after the original listener receives the SearchResponse. I think it is safer to switch the orders just in case.

stats.inc(RestStatus.OK);

indicesService.addDocStatusStats(stats);
indicesService.addNewDocStatusStats(stats);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these changes can be avoided when you change the existing method instead.

Copy link
Contributor Author

@ajleong623 ajleong623 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An issue is that I could not directly change the signature of the existing method because of backwards compatibility. addDocStatusStats would have to taken in the new class for DocStatusStats. I could try to rewrite the method but with a different signature which is what I should have probably done in the first place.

this.isThrottled = isThrottled;
this.throttleTimeInMillis = throttleTimeInMillis;
this.docStatusStats = docStatusStats;
this.docStatusStats = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are deprecating this construct then do we really need the ctor changes ?

@github-actions
Copy link
Contributor

❌ Gradle check result for b544e35: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ajleong623
Copy link
Contributor Author

@kaushalmahi12 Thank you so much for looking into my pr and reviewing it. The idea is that I would like to track search failures in some kind of counter which will be stored in IndicesService. All search requests are handled in TransportSearchAction, so that will be where the counters are updated.

Additionally, #8716 implemented a change where document response statuses are counted. Based on some comments from #18438 (comment) I refactored the logic as described in #18438 (comment).

I have addressed a couple of the comments but will continue to work through them. Thank you again for the review!

@github-actions
Copy link
Contributor

❌ Gradle check result for 3455efb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@ajleong623
Copy link
Contributor Author

@kaushalmahi12 I believe the requested changes have been addressed. I have made a new class AbstractStatusStats which holds the common methods between DocStatusStats and StatusCounterStats. Additionally, I believe I was able to replace the addDocStatusStats method in IndicesService without breaking backwards compatibility.

@github-actions
Copy link
Contributor

❌ Gradle check result for a2aa335: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 93f508f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for eb19367: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 536c20b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Anthony Leong <[email protected]>
@github-actions
Copy link
Contributor

✅ Gradle check result for de8b27d: SUCCESS

@ajleong623
Copy link
Contributor Author

@kaushalmahi12 I believe the comments you suggested have been addressed. Could you please confirm whether that is true or not since I did not want to resolve the comments without making sure they are truly resolved? Also, I noticed that this PR is scheduled for a 3.3.0 release

@github-actions github-actions bot added bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing labels Oct 1, 2025
@sandeshkr419 sandeshkr419 added v3.4.0 Issues and PRs related to version 3.4.0 and removed v3.3.0 labels Oct 1, 2025
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 2, 2025
@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Nov 10, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 917722c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing Search:Query Insights Search Search query, autocomplete ...etc v3.4.0 Issues and PRs related to version 3.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Refactor DocStatusStats [Feature Request] Track non-successful Search API calls across coordinator nodes

4 participants