Skip to content

Conversation

@nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented Jan 18, 2022

DRIVERS-1954

Along with updating the section that speaks about how to detect the stale primary, I reordered our mention of the two fields to help imply the precedence along with the explcit mention of the ordering.

Points of discussion:

  • I removed the mention of "big-endian integer" comparision of ObjectIds, the byte order must remain the same on all hardware so memory comparisons are consistent
  • I refactored the pseudo code's formatting, let me know if this impacts readability
  • I added one test that will fail on drivers currently but will pass with a change to the precedence check
  • I added a line where we record the maxElectionId if we haven't yet recorded one
  • I had to modify one existing test to get it to pass with the new precedence.
  • Should we test the case where an electionId hasn't been seen yet (so maxElectionId == null) but also maxSetVersion == null and a new electionId comes in independently of a setVersion? I don't think this is a real scenario, I believe both have to be either null or non nullish together.

Node POC: mongodb/node-mongodb-native#3109

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I confirmed that the new tests fail without driver changes and pass afterwards. There's only one hiccup in "use_setversion_without_electionid.yml". I'm also going to find someone on the server team to review this as well.

@nbbeeken nbbeeken requested a review from jasonjhchan February 4, 2022 16:22
@ShaneHarvey
Copy link
Member

@jasonjhchan Could you also help us answer this question?:

Should we test the case where an electionId hasn't been seen yet (so maxElectionId == null) but also maxSetVersion == null and a new electionId comes in independently of a setVersion? I don't think this is a real scenario, I believe both have to be either null or non nullish together.

@jasonjhchan
Copy link

jasonjhchan commented Feb 7, 2022

Should we test the case where an electionId hasn't been seen yet (so maxElectionId == null) but also maxSetVersion == null and a new electionId comes in independently of a setVersion? I don't think this is a real scenario, I believe both have to be either null or non nullish together.

I don't believe this should ever happen, but we have tests in the server for it anyways for better coverage. It could be nice to do the same in our driver tests but I will defer to your team.

@nbbeeken
Copy link
Contributor Author

nbbeeken commented Feb 7, 2022

Do not merge: Acceptance criteria, replicating this server jstest in node

Filing a follow up drivers ticket to sync sdam tests, this PR does not need to block

@nbbeeken
Copy link
Contributor Author

nbbeeken commented Feb 8, 2022

Synced SDAM test:
https://github.com/mongodb/mongo/blob/565f818c1277998b9712fe2c9fc8c6d6158df07b/src/mongo/client/sdam/json_tests/sdam_tests/rs/set_version_can_rollback.json

Outstanding Q:

I believe the maxElectionId/maxSetVersion changes the PR outlines are inconsistent with what the server does. My guess is that the server tracks the max (electionId, setVersion) tuple whereas the spec says to track each max value independently.

@shuvalov-mdb Any insight here? Drivers currently will not roll back the maxSetVersion as this new test expects on line 46 vs line 94

@shuvalov-mdb
Copy link

Hi, it's more a question do drivers actually care if the maxSetVersion rolls back? Yes, this is a valid scenario that RSM in server now supports, because server cares about maxSetVersion in its logic. But for drivers, you only care if the set of servers you track is current, as long as you can discover the primary. Do you need any other scenario?

To clarify: in servers the electionId wins, the compare ignores maxSetVersion if electionId is different, which means there is a real scenario when electionId increments and maxSetVersion decrements. The bugfix I did was to support the maxSetVersion going backwards, it never did before.

@shuvalov-mdb
Copy link

For the record - why maxSetVersion can rollback: a command that affects the set version can return before the state is replicated and increments the set version at the caller. If the primary fails over before the new state is replicated the new primary will have greater term (electionId) but old set version.

@ShaneHarvey
Copy link
Member

@shuvalov-mdb drivers also utilize maxSetVersion to detect stale primaries (see the pseudocode changes in this PR) . Are you suggesting that we only need to track maxElectionId and ignore maxSetVersion/setVersion altogether?

@nbbeeken
Copy link
Contributor Author

nbbeeken commented Mar 1, 2022

@ShaneHarvey This is RFAL, I replicated in pseudo code what I did in my driver I think the verbose variables are clearer than the deeply nested conditional, lmk what you think

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see a few test failures:

FAIL: test_rs_electionId_setVersion (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 195, in check_outcome
    self.assertEqual(outcome.get("maxSetVersion"), topology.description.max_set_version)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 905, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: 2 != 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
    raise exc_type(exc_val).with_traceback(exc_tb)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 195, in check_outcome
    self.assertEqual(outcome.get("maxSetVersion"), topology.description.max_set_version)
AssertionError: 2 != 1

======================================================================
FAIL: test_rs_null_election_id (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
    self.assertEqual(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 753, in fail
    raise self.failureException(msg)
AssertionError: 'RSPrimary' != 'Unknown'
- RSPrimary
+ Unknown


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
    raise exc_type(exc_val).with_traceback(exc_tb)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
    self.assertEqual(
AssertionError: 'RSPrimary' != 'Unknown'
- RSPrimary
+ Unknown


======================================================================
FAIL: test_rs_setversion_without_electionid (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
    self.assertEqual(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 753, in fail
    raise self.failureException(msg)
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
    raise exc_type(exc_val).with_traceback(exc_tb)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
    self.assertEqual(
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary


======================================================================
FAIL: test_rs_use_setversion_without_electionid (test.test_discovery_and_monitoring.TestAllScenarios)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
    self.assertEqual(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 912, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py", line 753, in fail
    raise self.failureException(msg)
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 970, in assertion_context
    raise exc_type(exc_val).with_traceback(exc_tb)
  File "/Users/shane/git/mongo-python-driver/test/utils.py", line 965, in assertion_context
    yield
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 213, in run_scenario
    check_outcome(self, c, phase["outcome"])
  File "/Users/shane/git/mongo-python-driver/test/test_discovery_and_monitoring.py", line 164, in check_outcome
    self.assertEqual(
AssertionError: 'Unknown' != 'RSPrimary'
- Unknown
+ RSPrimary

I haven't looked into all the failures but the last one looks like a problem with the use_setversion_without_electionid.yml test. In the step for # Reconfig, B reports as primary, B is missing the electionId but reports setVersion, shouldn't B's response be considered stale with the updated comparison rules?

Here's how I've implemented the comparison which I think is correct. MinKey compares False for > and < for all objects and True for == with other MinKey which simplifies the comparison logic:

    max_election_tuple = max_election_id, max_set_version
    new_election_tuple = server_description.election_id, server_description.set_version
    max_election_compare_safe = tuple(i if i is not None else MinKey() for i in max_election_tuple)
    new_election_compare_safe = tuple(i if i is not None else MinKey() for i in new_election_tuple)

    if new_election_compare_safe >= max_election_compare_safe:
        max_election_id, max_set_version = new_election_tuple
    else:
        # Stale primary, set to type Unknown.
        sds[server_description.address] = server_description.to_unknown()
        return _check_has_primary(sds), replica_set_name, max_set_version, max_election_id

@nbbeeken
Copy link
Contributor Author

nbbeeken commented Mar 3, 2022

You're missing the leading if stmt:
if description.electionId is not null and description.setVersion is not null:
We don't enter the block where staleness is marked if the description doesn't have both of these properties defined, and that was true before this spec change, I'm still thinking if that makes sense or not 🤔

@ShaneHarvey
Copy link
Member

I believe that if stmt is incorrect. A null field always compares < a non-null field, so if the max tuple is (OID(1), 1) and the new one has a null, the new one should be considered stale. This is how the server works, see:
https://github.com/mongodb/mongo/blob/r5.2.0/src/mongo/client/sdam/json_tests/sdam_tests/rs/use_setversion_without_electionid.json#L52-L84

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests pass in Python.

and topologyDescription.maxSetVersion < serverDescription.setVersion
):
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
Copy link
Member

@ShaneHarvey ShaneHarvey Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see this if-stmt and the "# Stale primary" be combined into one if/else for simplicity. If we are leaving the null comparison implied and not explicit (which the current code does) then what do you think about this?:

    if (serverDescription.electionId > serverDescription.maxElectionId or
        (topologyDescription.electionId == serverDescription.maxElectionId and
         topologyDescription.setVersion >= serverDescription.maxSetVersion)):

        topologyDescription.maxElectionId = serverDescription.electionId
        topologyDescription.maxSetVersion = serverDescription.setVersion
    else:
        # Stale primary....

If we want to handle null more explicitly:

    newElectionId = serverDescription.electionId or ObjectId('000000000000000000000000')
    newSetVersion = serverDescription.setVersion or 0
    maxElectionId = topologyDescription.electionId or ObjectId('000000000000000000000000')
    maxSetVersion = topologyDescription.setVersion or 0
    if (newElectionId > maxElectionId or
        (newElectionId == maxElectionId and newSetVersion >= maxSetVersion)):
        topologyDescription.maxElectionId = serverDescription.electionId
        topologyDescription.maxSetVersion = serverDescription.setVersion
    else:
        # Stale primary....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with just the first way for now, We could maybe expand the null comment with the suggestions for zero OID/setVersion, I'm thinking its too verbose for pseudo code.

@nbbeeken nbbeeken requested review from ShaneHarvey and jyemin March 7, 2022 19:53
Copy link
Contributor

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nbbeeken nbbeeken merged commit 5bd06a8 into master Mar 7, 2022
@nbbeeken nbbeeken deleted the DRIVERS-1954/electionId-priority-over-setVersion branch March 7, 2022 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants