-
Notifications
You must be signed in to change notification settings - Fork 44
feat(sdk)!: detect stale nodes #2254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduced in this pull request primarily focus on enhancing error handling and retry logic within the DAPI client and SDK. Key modifications include the replacement of the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (9)
packages/rs-dapi-client/src/lib.rs (1)
74-88: Approved: Excellent improvements to error handling and retry logicThe changes to the
CanRetrytrait are well-implemented and align perfectly with the PR objectives. The newcan_retrymethod provides more flexibility in error handling, while the deprecation ofis_node_failuremaintains backward compatibility.A minor suggestion for improvement:
Consider adding a brief example in the documentation for
can_retryto illustrate its usage, especially how to handle theNonecase. This would further enhance the clarity for developers using this trait.Example:
/// Returns true if the operation can be retried. /// `None` means unspecified - in this case, you should either inspect the error /// in more details, or assume `false`. /// /// # Example /// ``` /// match error.can_retry() { /// Some(true) => retry_operation(), /// Some(false) => handle_non_retryable_error(), /// None => { /// // Inspect error in more detail or assume false /// if should_retry_based_on_error_details(error) { /// retry_operation() /// } else { /// handle_non_retryable_error() /// } /// } /// } /// ``` fn can_retry(&self) -> Option<bool>;packages/rs-sdk/src/error.rs (1)
84-100: LGTM:CanRetrytrait implementation looks good.The implementation correctly identifies retryable errors as per the PR objectives, covering stale proofs, DAPI client errors, core client errors, and timeout errors. The use of the
matches!macro results in concise and readable code.Consider adding a comment explaining the rationale behind categorizing these specific errors as retryable. This would enhance code maintainability and make it easier for other developers to understand and potentially extend this logic in the future.
Example:
impl CanRetry for Error { fn can_retry(&self) -> Option<bool> { // The following errors are considered retryable: // - Stale proofs: The node might have updated data on a subsequent request // - DAPI and Core client errors: These might be transient network issues // - Timeout errors: A retry might succeed if the timeout was due to temporary congestion let retry = matches!( self, Error::Proof(drive_proof_verifier::Error::StaleProof(..)) | Error::DapiClientError(_) | Error::CoreClientError(_) | Error::TimeoutReached(_, _) ); if retry { Some(true) } else { None } } }packages/rs-sdk/Cargo.toml (1)
9-9: LGTM! Consider specifying features for chrono.The addition of chrono as a regular dependency aligns well with the PR objectives, particularly for implementing time-based checks for stale proofs. The version specified (0.4.38) is relatively recent, which is good for security and feature support.
Consider specifying only the features you need for chrono to minimize the dependency footprint. For example:
-chrono = { version = "0.4.38" } +chrono = { version = "0.4.38", default-features = false, features = ["clock"] }This assumes you only need the
clockfeature. Adjust the features based on your specific requirements.packages/rs-dapi-client/src/transport/grpc.rs (2)
Line range hint
120-141: Approved: Improved retry logic with more flexibilityThe changes to the
can_retrymethod are well-implemented and align with the PR objectives. The use ofOption<bool>provides more flexibility in error handling, and the inverted logic makes it clearer which cases are retryable.Consider adding a brief comment explaining the logic behind the
retryvariable, e.g.:// Determine if retry is possible based on the status code // We retry for all codes except those explicitly listed let retry = !matches!( code, Ok | DataLoss | Cancelled | Unknown | DeadlineExceeded | ResourceExhausted | Aborted | Internal | Unavailable );This would enhance code readability and make the intention behind the logic more explicit.
Line range hint
1-541: Suggestion: Improve file organization and documentationWhile the changes in this file are focused on the
can_retrymethod, there are some general improvements that could enhance the overall quality of the file:
Consider grouping the
impl_transport_request_grpc!macro invocations by client type (Platform and Core) using comments or regions for better organization.Add a brief documentation comment at the beginning of the file explaining its purpose and the main components it contains.
Consider adding a TODO comment to explore if any of the repeated
impl_transport_request_grpc!macro invocations can be further abstracted or generated programmatically to reduce code duplication.These suggestions are not directly related to the current changes but could improve the overall maintainability of the file in the future.
packages/rs-dapi-client/src/dapi_client.rs (1)
Line range hint
236-246: HandleNonecases explicitly when checkingerror.can_retry()Using
error.can_retry().unwrap_or(false)treatsNoneasfalse, which may not reflect the intended behavior for indeterminate retry cases. To prevent unintended consequences, handle theNonecase explicitly.Consider applying this change to handle
Noneexplicitly:- if !error.can_retry().unwrap_or(false) { + match error.can_retry() { + Some(true) => { + // Error is retryable; no action needed here + }, + Some(false) => { + // Error is not retryable; ban the address if necessary + if applied_settings.ban_failed_address { + let mut address_list = self + .address_list + .write() + .expect("can't get address list for write"); + + address_list.ban_address(&address) + .map_err(DapiClientError::<<R::Client as TransportClient>::Error>::AddressList)?; + } + }, + None => { + // Indeterminate retryability; decide how to handle this case + // Possibly log or take default action + }, + }packages/rs-sdk/src/sdk.rs (3)
641-641: Incomplete 'FIXME' CommentThe comment at line 641 seems incomplete:
// FIXME: in future, we need to implement t. Please complete or clarify the intended message.
751-751: Typo in Documentation CommentIn the documentation comment at line 751, "fefault" should be "default".
919-919: Typo in Documentation CommentIn the documentation comment at line 919, "fefault" should be "default".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-dapi-client/src/lib.rs (1 hunks)
- packages/rs-dapi-client/src/transport/grpc.rs (2 hunks)
- packages/rs-drive-proof-verifier/src/error.rs (1 hunks)
- packages/rs-sdk/Cargo.toml (1 hunks)
- packages/rs-sdk/src/error.rs (2 hunks)
- packages/rs-sdk/src/sdk.rs (13 hunks)
🧰 Additional context used
🔇 Additional comments (4)
packages/rs-sdk/src/error.rs (2)
8-8: LGTM: Import statement updated correctly.The addition of
CanRetryto the import statement is consistent with the new trait implementation and aligns with the PR objectives of improving error handling and retry logic.
Line range hint
1-100: Overall assessment: Changes align well with PR objectives.The modifications to this file effectively implement the desired improvements in error handling and retry logic. The new
CanRetrytrait implementation provides a clear mechanism for identifying retryable errors, which should enhance the SDK's ability to handle transient issues.packages/rs-drive-proof-verifier/src/error.rs (1)
85-89: Verify that all error handling covers the newStaleProofvariantThe addition of the
StaleProofvariant to theErrorenum may require updates in parts of the codebase whereErroris matched or handled. Ensure that all pattern matches and error handling logic have been updated accordingly to incorporate this new variant.Run the following script to locate all match statements involving
Error:This will help identify any areas that need to be updated to handle
StaleProofappropriately.packages/rs-dapi-client/src/dapi_client.rs (1)
42-49: ClarifyNonehandling incan_retrymethodThe
can_retrymethod returnsOption<bool>, whereNoneindicates indeterminate retry capability. Currently, theMockvariant returnsNone. Please confirm if returningNoneforMockerrors is intentional, or consider returningSome(true)orSome(false)to specify the retry behavior explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
packages/rs-drive-proof-verifier/src/error.rs (1)
95-96: Minor improvement for consistency in error messagesThe error messages in
StaleProofErrorare informative and well-structured. However, there's a small inconsistency in terminology between the two variants:
StaleProofHeightuses "received" in both the field name and error message.Time(soon to beStaleProofTime) uses "received" in the field name but "actual" in the error message.For better consistency, consider using "received" in both error messages.
Suggested change for the
Timevariant's error message:#[error( "stale proof time: expected {expected_timestamp_ms}ms, received {received_timestamp_ms} ms, tolerance {tolerance_ms} ms; try another server" )]This small change will make the error messages more consistent across both variants.
Also applies to: 105-107
packages/rs-sdk/src/sdk.rs (4)
104-121: LGTM! New fields for stale proof detection.The addition of
previous_proof_height,proof_height_tolerance, andproof_time_tolerance_msfields to theSdkstruct implements the stale proof detection mechanism as described in the PR objectives. This is a good approach to ensure the freshness of received proofs.Consider adding documentation comments for these new fields to explain their purpose and default values, especially for
proof_time_tolerance_mswhich is set toNoneby default.
577-615: Approve time verification logic with a suggestion.The
verify_proof_timefunction correctly implements the time-based stale proof detection. However, there's a potential issue with using the local system time.Consider using a more reliable time source or implementing a time synchronization mechanism. The current implementation assumes that the local system time is accurate, which may not always be the case. This could lead to false positives or negatives in stale proof detection.
One possible approach is to use a Network Time Protocol (NTP) client to periodically synchronize the local time with a reliable time server. This would help ensure more accurate time comparisons.
742-754: LGTM! Configuration options for stale proof detection.The addition of
proof_height_toleranceandproof_time_tolerance_msfields to theSdkBuilderstruct, along with their respectivewith_*methods, provides good flexibility for users to configure the stale proof detection mechanism.In the
with_proof_time_tolerancemethod documentation, there's a typo in the word "default". It's written as "fefault". Please correct this typo.- /// This is set to `None` by fefault. + /// This is set to `None` by default.Also applies to: 903-930
1090-1216: LGTM! Comprehensive test coverage for stale proof detection.The new test module provides good coverage for the
verify_proof_heightandverify_proof_timefunctions. The use oftest_matrixallows for testing multiple scenarios efficiently.Consider adding a few more edge cases to the
test_verify_proof_timefunction:
- Test with
u64::MAXforreceivedandnowto ensure no overflow issues.- Test with very large tolerance values to ensure they're handled correctly.
These additional tests would help ensure the robustness of the time verification logic under extreme conditions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-drive-proof-verifier/src/error.rs (1 hunks)
- packages/rs-sdk/src/sdk.rs (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-dapi-client/src/dapi_client.rs
🧰 Additional context used
🔇 Additional comments (4)
packages/rs-drive-proof-verifier/src/error.rs (2)
85-89: LGTM: StaleProof variant added correctlyThe addition of the
StaleProofvariant to theErrorenum is well-implemented. The use of#[error(transparent)]and#[from]attributes ensures proper error propagation and conversion fromStaleProofError. This change aligns with the PR objective of detecting stale proofs.
85-115: Summary: Solid implementation of stale proof detectionThe changes introduced in this file effectively implement the stale proof detection mechanism as outlined in the PR objectives. The new
StaleProofvariant in theErrorenum and theStaleProofErrorenum provide a robust structure for handling and reporting stale proof errors.The implementation is well-thought-out, with detailed error messages that include all relevant information. The suggested improvements are minor and focus on enhancing consistency and clarity in naming and error messages.
Overall, this is a strong addition to the error handling system that will improve the ability to detect and respond to stale proofs in the system.
packages/rs-sdk/src/sdk.rs (2)
617-679: LGTM! Robust height verification logic.The
verify_proof_heightfunction implements a solid approach to height-based stale proof detection. It handles various edge cases well, including:
- Same height proofs
- Initial SDK startup (when
previs less thantolerance)- Concurrent updates from multiple threads
The use of atomic operations ensures thread-safety, which is crucial for this shared state.
Line range hint
1-1216: Overall LGTM! Solid implementation of stale proof detection.The changes in this file successfully implement the stale proof detection mechanism as described in the PR objectives. Key points:
- The
Sdkstruct now includes fields for tracking and configuring proof staleness.- New helper functions
verify_proof_timeandverify_proof_heightimplement the core logic for detecting stale proofs.- The
SdkBuilderprovides configuration options for users to customize the stale proof detection behavior.- Comprehensive test coverage ensures the reliability of the new functionality.
These changes will help mitigate the issue of receiving outdated information when fetching data from nodes that have experienced a chain halt.
To ensure that the new stale proof detection mechanism is properly integrated, let's verify its usage in other parts of the codebase:
This will help confirm that the new functionality is being used correctly throughout the project.
✅ Verification successful
Line range hint
1-1216: Overall LGTM! Solid implementation of stale proof detection.The changes in this file successfully implement the stale proof detection mechanism as described in the PR objectives. Key points:
- The
Sdkstruct now includes fields for tracking and configuring proof staleness.- New helper functions
verify_proof_timeandverify_proof_heightimplement the core logic for detecting stale proofs.- The
SdkBuilderprovides configuration options for users to customize the stale proof detection behavior.- Comprehensive test coverage ensures the reliability of the new functionality.
These changes will help mitigate the issue of receiving outdated information when fetching data from nodes that have experienced a chain halt.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usage of verify_proof_metadata rg --type rust 'verify_proof_metadata' # Search for usage of with_proof_height_tolerance and with_proof_time_tolerance rg --type rust 'with_proof_height_tolerance|with_proof_time_tolerance'Length of output: 751
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
packages/rs-sdk/src/sdk.rs (4)
243-248: LGTM: Enhanced proof verification inparse_proof_with_metadata_and_proof.The addition of the
verify_proof_metadatacall in theparse_proof_with_metadata_and_proofmethod improves the robustness of proof verification. This change aligns well with the new fields added to theSdkstruct.Consider adding a comment explaining the purpose of the
verify_proof_metadatacall for better code readability.
577-615: LGTM: Well-implementedverify_proof_timefunction.The
verify_proof_timefunction correctly checks if the proof time is within the specified tolerance of the current time. It includes appropriate logging and returns a detailed error when the time is out of tolerance.Consider using a constant for the log level (e.g.,
warn!) instead of hardcoding it, to make it easier to adjust logging verbosity in the future.
617-677: LGTM: Well-implementedverify_proof_heightfunction with proper concurrency handling.The
verify_proof_heightfunction correctly checks if the proof height is within the specified tolerance of the previous proof height. It uses atomic operations to handle concurrent access to the previous proof height, ensuring thread safety. The function includes appropriate logging and returns a detailed error when the height is out of tolerance.Consider adding a comment explaining the purpose of the while loop (lines 666-674) for better code readability, as the atomic compare-and-swap logic might not be immediately obvious to all readers.
1088-1214: LGTM: Comprehensive test coverage for new proof verification functionality.The new test module provides excellent coverage for the
verify_proof_heightandverify_proof_timefunctions, as well as the behavior of cloned SDK instances with respect to proof height verification. The use of test matrices allows for thorough testing of various input combinations, covering both valid and invalid cases.Consider adding a few more edge cases to the
test_verify_proof_timetest matrix, such as testing with very large time values or the maximum possibleu64value, to ensure robustness against potential overflow scenarios.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- packages/rs-drive-proof-verifier/src/error.rs (1 hunks)
- packages/rs-sdk/src/sdk.rs (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-drive-proof-verifier/src/error.rs
🧰 Additional context used
🔇 Additional comments (4)
packages/rs-sdk/src/sdk.rs (4)
104-121: LGTM: New fields for proof validation.The addition of
previous_proof_height,proof_height_tolerance, andproof_time_tolerance_msfields to theSdkstruct enhances the SDK's ability to validate proof freshness. These fields are well-typed and appropriately scoped for their intended use.
250-267: LGTM: Well-implementedverify_proof_metadatamethod.The
verify_proof_metadatamethod effectively utilizes the new fields to perform comprehensive proof validation. It checks both proof height and time when the respective tolerances are set, with appropriate error handling.
Line range hint
740-928: LGTM: Well-implemented additions toSdkBuilderfor configuring proof tolerances.The additions to the
SdkBuilderstruct and its implementation, including the new fieldsproof_height_toleranceandproof_time_tolerance_ms, along with their corresponding configuration methodswith_proof_height_toleranceandwith_proof_time_tolerance, provide a clean and flexible way to set up proof validation tolerances during SDK initialization. These changes are consistent with the new fields added to theSdkstruct.The comments for
with_proof_time_tolerancemethod provide valuable information about the implications of enabling time-based proof checks. This is excellent documentation that will help users make informed decisions when configuring the SDK.
Line range hint
1-1214: Overall assessment: Excellent implementation of proof staleness detection.The changes introduced in this file significantly enhance the SDK's ability to detect stale proofs received from nodes. The implementation aligns well with the PR objectives and includes:
- New fields in the
Sdkstruct for tracking and configuring proof validation.- Enhanced proof verification in existing methods.
- New functions for verifying proof time and height.
- Updates to the
SdkBuilderfor flexible configuration of proof validation tolerances.- Comprehensive test coverage for the new functionality.
The implementation is thread-safe, well-documented, and provides users with the flexibility to configure proof staleness detection according to their needs. These changes should effectively address the issue of receiving outdated information when fetching data from nodes that have experienced a chain halt.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
packages/rs-sdk/src/error.rs (2)
70-73: LGTM: NewStaleNodevariant added correctly.The addition of the
StaleNodevariant to theErrorenum aligns well with the PR objective of detecting stale proofs. The use of#[from]and#[error(transparent)]attributes is appropriate for error conversion and propagation.Consider adding a brief doc comment to explain the purpose of this variant, e.g.:
/// Error indicating that the node provided stale data #[error(transparent)] StaleNode(#[from] StaleNodeError),
100-125: LGTM:StaleNodeErrorenum defined correctly with comprehensive error handling.The
StaleNodeErrorenum is well-defined and aligns perfectly with the PR objective of detecting stale proofs based on height and time. Both variants (HeightandTime) include detailed error messages and relevant fields, which will be helpful for debugging and error reporting.The use of the
#[error]attribute for custom error formatting is appropriate, and the inclusion of doc comments for the enum and its fields is commendable.Consider adding a brief example in the doc comment for the
StaleNodeErrorenum to illustrate its usage, e.g.:/// Server returned stale metadata /// /// # Examples /// /// ``` /// use your_crate::StaleNodeError; /// /// let height_error = StaleNodeError::Height { /// expected_height: 100, /// received_height: 90, /// tolerance_blocks: 5, /// }; /// /// let time_error = StaleNodeError::Time { /// expected_timestamp_ms: 1000, /// received_timestamp_ms: 900, /// tolerance_ms: 50, /// }; /// ``` #[derive(Debug, thiserror::Error)] pub enum StaleNodeError { // ... (rest of the enum definition) }This addition would enhance the documentation and provide a clear usage example for developers.
packages/rs-dapi-client/src/transport/grpc.rs (1)
Line range hint
120-132: Approved: Improved error handling and retry logicThe changes to the
CanRetrytrait implementation fordapi_grpc::tonic::Statusare well-thought-out and align with the PR objectives. The newcan_retrymethod name is more descriptive, and the inverted logic using thematches!macro improves readability and maintainability.Consider adding a blank line after the
use dapi_grpc::tonic::Code::*;statement for better code readability:fn can_retry(&self) -> bool { let code = self.code(); use dapi_grpc::tonic::Code::*; + !matches!( code, Ok | DataLoss | Cancelled | Unknown | DeadlineExceeded | ResourceExhausted | Aborted | Internal | Unavailable ) }packages/rs-dapi-client/src/dapi_client.rs (2)
Line range hint
236-242: Re-evaluate Address Banning on Non-Retryable ErrorsIn the error handling block, when
!error.can_retry()istrue(i.e., the error is non-retryable), the code bans the failed address:if !error.can_retry() { if applied_settings.ban_failed_address { // Ban the address } }This might lead to banning addresses due to non-recoverable errors, which could be outside the node's control (e.g., client-side issues).
Consider modifying the logic to ban addresses only on retryable errors that persist, indicating a problem with the node:
-if !error.can_retry() { +if error.can_retry() {This change ensures that only addresses causing persistent, retryable errors are banned, improving the robustness of the client.
272-274: Log Retryable Errors AppropriatelyCurrently, only non-retryable errors are logged at the error level:
if !error.can_retry() { tracing::error!(?error, "request failed"); }Consider also logging retryable errors, perhaps at the
warnlevel, to provide visibility into issues that require retries:if !error.can_retry() { tracing::error!(?error, "request failed"); +} else { + tracing::warn!(?error, "retryable error occurred"); }This enhancement aids in monitoring and debugging by highlighting errors that cause retries.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-dapi-client/src/lib.rs (1 hunks)
- packages/rs-dapi-client/src/transport/grpc.rs (1 hunks)
- packages/rs-sdk/src/error.rs (3 hunks)
- packages/rs-sdk/src/sdk.rs (15 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/rs-dapi-client/src/lib.rs
- packages/rs-sdk/src/sdk.rs
🧰 Additional context used
🔇 Additional comments (3)
packages/rs-sdk/src/error.rs (2)
8-8: LGTM: Import statement updated correctly.The import statement has been properly updated to include the
CanRetrytrait, which is consistent with its implementation for theErrorenum later in the file.
88-98: LGTM:CanRetrytrait implemented correctly.The implementation of the
CanRetrytrait for theErrorenum is well-done. It correctly includes the newStaleNodeerror in the list of retryable errors, along with other relevant error types. The use of thematches!macro is appropriate and efficient for this kind of check.This implementation aligns well with the PR objective of improving retry logic in the SDK.
packages/rs-dapi-client/src/dapi_client.rs (1)
42-47: Verifycan_retryLogic for All Error VariantsThe
can_retrymethod returnstrueforNoAvailableAddressesandAddressListerrors, and delegates totransport_error.can_retry()forTransporterrors. ForMockerrors, it returnsfalse.Please confirm that this logic aligns with the intended retry policy:
NoAvailableAddresses: Retrying may not be effective if there are truly no available addresses. Consider whether it should returnfalseto prevent unnecessary retries.AddressListErrors: Assess if these errors are recoverable through retries.Ensuring the
can_retrylogic accurately reflects which errors are retryable will prevent futile retries and potential resource wastage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (9)
packages/rs-dapi-client/src/lib.rs (1)
74-85: Excellent improvements to theCanRetrytrait!The changes to the
CanRetrytrait are well-implemented and improve its functionality:
- The new
can_retrymethod provides a clear and simple interface.- The deprecation of
is_node_failureis handled gracefully, maintaining backward compatibility while encouraging the use of the new method.- The trait documentation has been updated to reflect these changes.
These modifications enhance the trait's clarity and align well with Rust best practices.
Consider adding a brief explanation in the trait-level documentation about when an operation might not be retryable. This could provide valuable context for users of the trait.
/// Returns true if the operation can be retried. +/// +/// Some operations may not be safely retried due to potential side effects or other constraints. +/// Implementors of this trait should carefully consider the conditions under which an operation +/// can be safely retried. pub trait CanRetry { /// Returns true if the operation can be retried safely. fn can_retry(&self) -> bool;packages/rs-sdk/Cargo.toml (1)
9-9: LGTM! Consider adding a comment for clarity.The addition of
chronoas a regular dependency is appropriate given the new time-based proof validation feature mentioned in the PR description. The version specified (0.4.38) is recent and exact, which is good for reproducibility.Consider adding a brief comment explaining why
chronois needed, e.g.:# Required for time-based proof validation chrono = { version = "0.4.38" }This will help future maintainers understand the purpose of this dependency.
packages/rs-sdk/src/error.rs (3)
70-73: LGTM: NewStaleNodevariant added correctly.The
StaleNodevariant has been appropriately added to theErrorenum, using the#[from]attribute for automatic conversion fromStaleNodeError. This addition aligns well with the PR objective of detecting stale proofs from nodes.Consider adding a brief doc comment to explain the purpose of this new error variant, for example:
/// Error indicating that the node provided stale data #[error(transparent)] StaleNode(#[from] StaleNodeError),
88-98: LGTM:CanRetrytrait implemented correctly.The
CanRetrytrait has been properly implemented for theErrorenum. Thecan_retrymethod correctly identifies which error types are eligible for retry attempts.To improve maintainability, consider defining a constant array of retryable error variants:
impl CanRetry for Error { const RETRYABLE_ERRORS: [Error; 4] = [ Error::StaleNode(..), Error::DapiClientError(_), Error::CoreClientError(_), Error::TimeoutReached(_, _), ]; fn can_retry(&self) -> bool { Self::RETRYABLE_ERRORS.iter().any(|e| std::mem::discriminant(e) == std::mem::discriminant(self)) } }This approach centralizes the list of retryable errors, making it easier to maintain and update in the future.
100-125: LGTM:StaleNodeErrorenum added with comprehensive error information.The
StaleNodeErrorenum has been well-designed to provide detailed information about stale node errors. BothHeightandTimevariants include all necessary fields to understand the nature of the staleness.For consistency with Rust naming conventions, consider changing the field names in the
Timevariant to snake_case:Time { /// Expected time in milliseconds - is local time when the message was received expected_timestamp_ms: u64, /// Time received from the server in the message, in milliseconds received_timestamp_ms: u64, /// Tolerance in milliseconds tolerance_ms: u64, }This change would make the field names consistent with those in the
Heightvariant.packages/rs-dapi-client/src/transport/grpc.rs (1)
Line range hint
120-133: Approve changes with a minor suggestion for improvementThe refactoring of the
CanRetryimplementation fordapi_grpc::tonic::Statusis well done. The new method namecan_retryis more descriptive and aligns better with the trait name. The inverted logic using thematches!macro improves readability and maintainability.However, to further enhance code organization and readability, consider extracting the list of status codes that prevent retrying into a constant or a separate function. This would make the main
can_retryfunction more concise and easier to understand at a glance.Here's a suggested refactoring:
impl CanRetry for dapi_grpc::tonic::Status { fn can_retry(&self) -> bool { !is_non_retryable_status(self.code()) } } fn is_non_retryable_status(code: tonic::Code) -> bool { use dapi_grpc::tonic::Code::*; matches!( code, Ok | DataLoss | Cancelled | Unknown | DeadlineExceeded | ResourceExhausted | Aborted | Internal | Unavailable ) }This refactoring separates the concerns of determining which status codes are non-retryable from the main
can_retrylogic, making the code more modular and easier to maintain.packages/rs-sdk/src/sdk.rs (3)
762-762: Typo in Error MessageThere's a typo in the error message:
"data conttact cache size must be positive". It should be"data contract cache size must be positive".Apply this diff to fix the typo:
.expect("data conttact cache size must be positive"), +.expect("data contract cache size must be positive"),
888-922: Clarify Documentation for Tolerance MethodsThe documentation for
with_height_toleranceandwith_time_tolerancemethods is comprehensive. However, consider adding examples to demonstrate how to use these methods effectively and explain the implications of different tolerance settings on staleness detection.
1081-1218: Test Coverage for Edge CasesThe added tests in the
testmodule are valuable for verifying the staleness logic. To enhance test coverage, consider adding more edge case scenarios, such as:
- Testing the behavior when
metadata_height_toleranceormetadata_time_tolerance_msisNone.- Simulating scenarios where the
expected_heightis exactly equal totolerance.- Verifying the behavior when system time moves backward.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-dapi-client/src/lib.rs (1 hunks)
- packages/rs-dapi-client/src/transport/grpc.rs (1 hunks)
- packages/rs-sdk/Cargo.toml (1 hunks)
- packages/rs-sdk/src/error.rs (3 hunks)
- packages/rs-sdk/src/sdk.rs (15 hunks)
🧰 Additional context used
🔇 Additional comments (6)
packages/rs-sdk/src/error.rs (2)
8-8: LGTM: Import statement updated correctly.The import statement has been appropriately updated to include
CanRetryalong withDapiClientError. This change aligns with the newCanRetrytrait implementation for theErrorenum.
Line range hint
1-125: Summary: Excellent implementation of stale node detection and error handling.The changes in this file effectively implement the stale node detection mechanism as described in the PR objectives. The new
StaleNodeErrorenum, along with theStaleNodevariant in theErrorenum, provides a robust way to handle and report stale node issues. The implementation of theCanRetrytrait enhances the SDK's ability to handle retryable errors gracefully.These modifications will significantly improve the SDK's resilience when dealing with nodes that may have experienced a chain halt or are otherwise out of sync. The clear error messages and detailed error information will aid in debugging and maintaining the system.
Overall, the changes are well-implemented, properly documented, and align closely with the PR objectives. Great job!
packages/rs-dapi-client/src/dapi_client.rs (2)
42-47: Implementation ofcan_retry()is appropriateThe
can_retry()method forDapiClientErrorcorrectly determines the retryability of different error types, which is essential for the retry logic.
272-274: Logging non-retryable errors enhances observabilityAppropriately logging errors that are not retryable helps in monitoring and diagnosing issues that cannot be automatically recovered.
packages/rs-sdk/src/sdk.rs (2)
134-136: Ensure Proper Cloning of New Fields inCloneImplementationThe
Cloneimplementation now includes cloning of the new fieldsmetadata_last_seen_height,metadata_height_tolerance, andmetadata_time_tolerance_ms. Verify that these fields are correctly cloned to prevent unintended sharing or concurrency issues, especially with the atomic variablemetadata_last_seen_height.
965-969: Initializemetadata_last_seen_heightAppropriatelyWhen initializing
metadata_last_seen_heightwithAtomicU64::new(0), consider whether starting at zero is appropriate. If zero is not a valid or meaningful height, this could affect staleness checks. Ensure that the initial value aligns with the expected behavior when the SDK starts operating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
packages/rs-sdk/src/error.rs (3)
70-73: LGTM: NewStaleNodevariant added correctly.The addition of the
StaleNodevariant to theErrorenum is appropriate and aligns with the PR objective of detecting stale nodes. The use of#[from]attribute for automatic conversion fromStaleNodeErroris a good practice.Consider updating the error message to be more specific:
- #[error(transparent)] + #[error("Stale node detected: {0}")] StaleNode(#[from] StaleNodeError),This change would provide more context when the error is displayed.
94-119: LGTM:StaleNodeErrorenum is well-designed and informative.The new
StaleNodeErrorenum is a great addition, providing specific information about stale node errors. The separate variants forHeightandTimeallow for precise error reporting, and the error messages are descriptive and include all necessary information.Consider moving the
StaleNodeErrorenum definition above theErrorenum in the file. This would improve code organization by defining the more specific error type before it's used in the mainErrorenum.// Move this block above the Error enum definition /// Server returned stale metadata #[derive(Debug, thiserror::Error)] pub enum StaleNodeError { // ... (rest of the enum definition) } // Then keep the Error enum as is /// Error type for the SDK #[derive(Debug, thiserror::Error)] pub enum Error { // ... (rest of the enum definition) }This change would enhance code readability and follow the convention of defining more specific types before their usage.
Line range hint
1-119: Overall, the changes look great and achieve the PR objectives.The modifications to
error.rssuccessfully introduce new error types and handling mechanisms for stale nodes, aligning well with the PR objectives. The code is well-structured, follows Rust best practices, and improves error handling by providing more detailed information about stale node errors.To further enhance the error handling capabilities, consider adding a logging mechanism for these errors. This could help with debugging and monitoring in production environments. For example:
use log::error; impl Error { pub fn log(&self) { match self { Error::StaleNode(stale_error) => { error!("Stale node detected: {}", stale_error); } // Add other error types as needed _ => { error!("An error occurred: {}", self); } } } }This addition would allow for consistent logging of errors throughout the SDK, which could be valuable for troubleshooting and monitoring.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-sdk/src/error.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-dapi-client/src/dapi_client.rs
🧰 Additional context used
🔇 Additional comments (2)
packages/rs-sdk/src/error.rs (2)
8-8: LGTM: Import statement updated correctly.The import statement has been properly updated to include the
CanRetrytrait, which is necessary for the new functionality being added to theErrorenum.
88-92: Implementation looks good, but let's revisit the retry logic.The implementation of the
CanRetrytrait for theErrorenum is well-structured and allows for more nuanced error handling. However, there's a potential concern regarding which errors should be retryable.Based on a previous comment by shumkov, there was a suggestion to limit retries to
Error::StaleNodeandError::TimeoutReached. The current implementation follows this suggestion. However, we should verify if this is still the desired behavior or if we need to adjust the retry logic for other error types.Could you confirm if the current retry logic aligns with the team's latest decision on error handling?
Issue being fixed or feature implemented
When fetching information from a node that has chain halt, we receive outdated information.
What was done?
Implemented two mechanisms to detect stale nodes:
How Has This Been Tested?
Added tests.
Breaking Changes
Queries to nodes that are out of sync return
StaleNodeErrorinstead of outdated information.rs_dapi_client::CanRetry::is_node_failure()is deprecated in favor ofcan_retry().Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit
New Features
StaleNodeErrortype for detailed reporting of stale metadata issues.Bug Fixes
Documentation
Tests