-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-1954: SDAM should give priority to electionId over setVersion when updating topology #1122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-1954: SDAM should give priority to electionId over setVersion when updating topology #1122
Conversation
… when updating topology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I confirmed that the new tests fail without driver changes and pass afterwards. There's only one hiccup in "use_setversion_without_electionid.yml". I'm also going to find someone on the server team to review this as well.
source/server-discovery-and-monitoring/tests/rs/use_setversion_without_electionid.yml
Show resolved
Hide resolved
source/server-discovery-and-monitoring/tests/rs/use_setversion_without_electionid.yml
Outdated
Show resolved
Hide resolved
|
@jasonjhchan Could you also help us answer this question?:
|
I don't believe this should ever happen, but we have tests in the server for it anyways for better coverage. It could be nice to do the same in our driver tests but I will defer to your team. |
|
Filing a follow up drivers ticket to sync sdam tests, this PR does not need to block |
|
Outstanding Q:
@shuvalov-mdb Any insight here? Drivers currently will not roll back the |
|
Hi, it's more a question do drivers actually care if the To clarify: in servers the |
|
For the record - why maxSetVersion can rollback: a command that affects the set version can return before the state is replicated and increments the set version at the caller. If the primary fails over before the new state is replicated the new primary will have greater term (electionId) but old set version. |
|
@shuvalov-mdb drivers also utilize maxSetVersion to detect stale primaries (see the pseudocode changes in this PR) . Are you suggesting that we only need to track maxElectionId and ignore maxSetVersion/setVersion altogether? |
source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
Outdated
Show resolved
Hide resolved
|
@ShaneHarvey This is RFAL, I replicated in pseudo code what I did in my driver I think the verbose variables are clearer than the deeply nested conditional, lmk what you think |
source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
Outdated
Show resolved
Hide resolved
source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
Outdated
Show resolved
Hide resolved
ShaneHarvey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see a few test failures:
FAIL: test_rs_electionId_setVersion (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 195, in check_outcome
self.assertEqual(outcome.get("maxSetVersion"), topology.description.max_set_version)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
assertion_func(first, second, msg=msg)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 905, in _baseAssertEqual
raise self.failureException(msg)
AssertionError: 2 != 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
raise exc_type(exc_val).with_traceback(exc_tb)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 195, in check_outcome
self.assertEqual(outcome.get("maxSetVersion"), topology.description.max_set_version)
AssertionError: 2 != 1
======================================================================
FAIL: test_rs_null_election_id (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
self.assertEqual(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
assertion_func(first, second, msg=msg)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
self.fail(self._formatMessage(msg, standardMsg))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 753, in fail
raise self.failureException(msg)
AssertionError: 'RSPrimary' != 'Unknown'
- RSPrimary
+ Unknown
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
raise exc_type(exc_val).with_traceback(exc_tb)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
self.assertEqual(
AssertionError: 'RSPrimary' != 'Unknown'
- RSPrimary
+ Unknown
======================================================================
FAIL: test_rs_setversion_without_electionid (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
self.assertEqual(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
assertion_func(first, second, msg=msg)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
self.fail(self._formatMessage(msg, standardMsg))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 753, in fail
raise self.failureException(msg)
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
raise exc_type(exc_val).with_traceback(exc_tb)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
self.assertEqual(
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary
======================================================================
FAIL: test_rs_use_setversion_without_electionid (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
self.assertEqual(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
assertion_func(first, second, msg=msg)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
self.fail(self._formatMessage(msg, standardMsg))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 753, in fail
raise self.failureException(msg)
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
raise exc_type(exc_val).with_traceback(exc_tb)
File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
yield
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
check_outcome(self, c, phase["outcome"])
File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
self.assertEqual(
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary
I haven't looked into all the failures but the last one looks like a problem with the use_setversion_without_electionid.yml test. In the step for # Reconfig, B reports as primary, B is missing the electionId but reports setVersion, shouldn't B's response be considered stale with the updated comparison rules?
Here's how I've implemented the comparison which I think is correct. MinKey compares False for > and < for all objects and True for == with other MinKey which simplifies the comparison logic:
max_election_tuple = max_election_id, max_set_version
new_election_tuple = server_description.election_id, server_description.set_version
max_election_compare_safe = tuple(i if i is not None else MinKey() for i in max_election_tuple)
new_election_compare_safe = tuple(i if i is not None else MinKey() for i in new_election_tuple)
if new_election_compare_safe >= max_election_compare_safe:
max_election_id, max_set_version = new_election_tuple
else:
# Stale primary, set to type Unknown.
sds[server_description.address] = server_description.to_unknown()
return _check_has_primary(sds), replica_set_name, max_set_version, max_election_id|
You're missing the leading if stmt: |
|
I believe that if stmt is incorrect. A null field always compares < a non-null field, so if the max tuple is |
ShaneHarvey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new tests pass in Python.
| and topologyDescription.maxSetVersion < serverDescription.setVersion | ||
| ): | ||
| topologyDescription.maxElectionId = serverDescription.electionId | ||
| topologyDescription.maxSetVersion = serverDescription.setVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see this if-stmt and the "# Stale primary" be combined into one if/else for simplicity. If we are leaving the null comparison implied and not explicit (which the current code does) then what do you think about this?:
if (serverDescription.electionId > serverDescription.maxElectionId or
(topologyDescription.electionId == serverDescription.maxElectionId and
topologyDescription.setVersion >= serverDescription.maxSetVersion)):
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
else:
# Stale primary....
If we want to handle null more explicitly:
newElectionId = serverDescription.electionId or ObjectId('000000000000000000000000')
newSetVersion = serverDescription.setVersion or 0
maxElectionId = topologyDescription.electionId or ObjectId('000000000000000000000000')
maxSetVersion = topologyDescription.setVersion or 0
if (newElectionId > maxElectionId or
(newElectionId == maxElectionId and newSetVersion >= maxSetVersion)):
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
else:
# Stale primary....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with just the first way for now, We could maybe expand the null comment with the suggestions for zero OID/setVersion, I'm thinking its too verbose for pseudo code.
jyemin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
DRIVERS-1954
Along with updating the section that speaks about how to detect the stale primary, I reordered our mention of the two fields to help imply the precedence along with the explcit mention of the ordering.
Points of discussion:
maxElectionId == null) but alsomaxSetVersion == nulland a new electionId comes in independently of a setVersion? I don't think this is a real scenario, I believe both have to be either null or non nullish together.Node POC: mongodb/node-mongodb-native#3109