Skip to content

Conversation

@guojialiang92
Copy link
Contributor

@guojialiang92 guojialiang92 commented Apr 1, 2025

Description

Added a test. In the current situation, if the primary shard publish checkpoint fails, it will cause the replica shard and the primary shard to fail to synchronize.
TransportReplicationAction support specifying retryTimeout.
PublishCheckpointAction use the never give up retry strategy.

Related Issues

Resolves 17595

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2025

❌ Gradle check result for 1edc0ca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…eckpointAction use the never give up strategy.

Signed-off-by: guojialiang <[email protected]>
@guojialiang92 guojialiang92 force-pushed the dev/PublishCheckpointAction_use_never_give_up_retry_strategy branch from 1edc0ca to e49aa81 Compare April 1, 2025 11:09
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2025

✅ Gradle check result for e49aa81: SUCCESS

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link
Contributor

❌ Gradle check result for b744f4b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: guojialiang <[email protected]>
@guojialiang92 guojialiang92 force-pushed the dev/PublishCheckpointAction_use_never_give_up_retry_strategy branch from b744f4b to 68a5e9d Compare April 14, 2025 16:16
@github-actions
Copy link
Contributor

❌ Gradle check result for 68a5e9d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…Action_use_never_give_up_retry_strategy

# Conflicts:
#	CHANGELOG.md
@github-actions
Copy link
Contributor

❌ Gradle check result for 3eb976e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ashking94
Copy link
Member

❌ Gradle check result for 3eb976e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Restarted the pr build.

@github-actions
Copy link
Contributor

❌ Gradle check result for 3eb976e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for a3a23a7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: guojialiang <[email protected]>
@github-actions
Copy link
Contributor

✅ Gradle check result for e7b926a: SUCCESS

@ashking94 ashking94 merged commit c44d230 into opensearch-project:main Apr 15, 2025
31 checks passed
Harsh-87 pushed a commit to Harsh-87/OpenSearch that referenced this pull request May 7, 2025
…h checkpoint tx action (opensearch-project#17749)

* TransportReplicationAction support specifying retryTimeout, PublishCheckpointAction use the never give up strategy.

Signed-off-by: guojialiang <[email protected]>

* support  PublishCheckpointAction PUBLISH_CHECK_POINT_RETRY_TIMEOUT to override the default retry timeout

Signed-off-by: guojialiang <[email protected]>

* add TransportReplicationAction.getRetryTimeoutSetting

Signed-off-by: guojialiang <[email protected]>

* add entry to CHANGELOG.md

Signed-off-by: guojialiang <[email protected]>

* rewrite the PR title

Signed-off-by: guojialiang <[email protected]>

* modify changelog entry

Signed-off-by: guojialiang <[email protected]>

* add comments

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

---------

Signed-off-by: guojialiang <[email protected]>
Signed-off-by: Harsh Kothari <[email protected]>
Harsh-87 pushed a commit to Harsh-87/OpenSearch that referenced this pull request May 7, 2025
…h checkpoint tx action (opensearch-project#17749)

* TransportReplicationAction support specifying retryTimeout, PublishCheckpointAction use the never give up strategy.

Signed-off-by: guojialiang <[email protected]>

* support  PublishCheckpointAction PUBLISH_CHECK_POINT_RETRY_TIMEOUT to override the default retry timeout

Signed-off-by: guojialiang <[email protected]>

* add TransportReplicationAction.getRetryTimeoutSetting

Signed-off-by: guojialiang <[email protected]>

* add entry to CHANGELOG.md

Signed-off-by: guojialiang <[email protected]>

* rewrite the PR title

Signed-off-by: guojialiang <[email protected]>

* modify changelog entry

Signed-off-by: guojialiang <[email protected]>

* add comments

Signed-off-by: guojialiang <[email protected]>

* update

Signed-off-by: guojialiang <[email protected]>

---------

Signed-off-by: guojialiang <[email protected]>
Signed-off-by: Harsh Kothari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Indexing:Replication Issues and PRs related to core replication framework eg segrep

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] segment replication stops when publish checkpoint fails

2 participants