Skip to content

Conversation

Khoross
Copy link
Contributor

@Khoross Khoross commented Mar 17, 2020

  • Please check if the Pull Request fulfills these requirements
  • [Y] The commit message is clear and concise
  • [Y] Test for the changes have been added and reviewed
  • [Y] Documentation has been added to all new features and edited code has had documentation reviewed
  • What kind of change does this Pull request introduce?
    Replacement for the NHS number generator algorithm

  • What is the current behaviour?
    Currently numbers are generated via a Monte Carlo method, generating random numbers until sufficiently many valid numbers have been made. This takes on average 11 attempts per number requested. This is also done sequentially rather than in a parallel or vectorized method.

  • What is the new behaviour?
    An algorithm based on modular arithmetic to generate known valid numbers. This works as follows:

  • The central 8 digits of the NHS number are generated, uniformly from 0 to 9

  • The contribution of the central digits to the check digit is calculated

  • A random number from 1 to 8 is generated for the leading digit

  • We calculate the leading digit value for which there would be an invalid check digit

  • If the random number for the leading digit is at or above the calculated value, then we increase it by 1

  • We finally calculate the (guaranteed to be valid) check digit
    This is all performed using numpy, to allow for vectorized calculations, greatly speeding up processing time if a large number are required.

  • Does this Pull Request introduce a breaking change?
    The new algorithm interacts with the random number generator in a different way, and so produces different results from the same seed. This cannot be avoided.

@georgeRobertson
Copy link
Collaborator

georgeRobertson commented May 27, 2020

Hi @Khoross , thank you for taking the time to create a pull request to the Codon repository. I also apologise for the delay in responding to your pull request, things have been very busy of late.

Unfortunatly, I am unable to merge your NHS number generator into Codon due to the build failing based on this error...

__________________ ERROR collecting codonPython/nhsNumber.py ___________________
codonPython/nhsNumber.py:2: in
from numpy.random import default_rng
E ImportError: cannot import name 'default_rng'

_____________ ERROR collecting codonPython/tests/nhsNumber_test.py _____________

codonPython/tests/nhsNumber_test.py:1: in
from codonPython.nhsNumber import nhsNumberGenerator, nhsNumberValidator
codonPython/nhsNumber.py:2: in
from numpy.random import default_rng
E ImportError: cannot import name 'default_rng'

Thanks,
George

Copy link
Collaborator

@georgeRobertson georgeRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Khoross,

Firstly, I'd like to apologies for the delay in reviewing your merge request. As you know, there hasn't been a lot of time to focus on Codon due to the impact Covid19 on current work volume.

I have reviewed your changes and it all looks good, however, there are a couple of changes required so that the code is adhering to PEP8. The issues that I found were that some of the lines were to long (must be <= 70) and that there were missing blank lines at the end of the nhsNumber.py and nhsNumber_test.py.

I was able to verify this using a Python linter known as Flake, this will automatically check the code against pep8 and identify any problems. This might be worth exploring in the future and can easily be installed using Conda or Pip package managers.

When you get chance, please could you update your branch against Codon's master branch and make the requested changes.

"""

if random_state:
random.seed(random_state)
if not isinstance(to_generate, int):
raise ValueError("Please input a positive integer to generate numbers.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

if not isinstance(to_generate, int):
raise ValueError("Please input a positive integer to generate numbers.")
if to_generate > 1000000:
raise ValueError("More than one million values requested")
if to_generate < 0:
raise ValueError("Please input a postitive integer to generate numbers.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

# First, generate 8 digits, using numpy.randint (the middle 8 digits)
# Second, generate the check digit portions for each block of 8 digits
# Third, generate 1 digit (the 1st digit) between 1 and 8
# increase this value by 1 if it is at or above the value which would cause a check digit of 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

# Second, generate the check digit portions for each block of 8 digits
# Third, generate 1 digit (the 1st digit) between 1 and 8
# increase this value by 1 if it is at or above the value which would cause a check digit of 10
# be aware that this will not produce a fully uniform distribution over NHS numbers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

# Third, generate 1 digit (the 1st digit) between 1 and 8
# increase this value by 1 if it is at or above the value which would cause a check digit of 10
# be aware that this will not produce a fully uniform distribution over NHS numbers
# the distribution will not produce any NHS number with a leading digit (or check digit) of 1 where the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70


result = np.dot(result_digits, 10 ** np.arange(9, -1, -1, dtype=np.int64))

return [int(val) for val in result]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra line required after line 101

@@ -5,20 +5,32 @@

@pytest.mark.parametrize(
"to_generate, random_state, expected",
[(3, 42, [8429141456, 2625792787, 8235363119]), (2, 1, [9598980006, 6597925149])],
[(3, 42, [7065337065, 6104866670, 4417443181]), (2, 1, [6446801785, 4227327237])],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

@@ -5,20 +5,32 @@

@pytest.mark.parametrize(
"to_generate, random_state, expected",
[(3, 42, [8429141456, 2625792787, 8235363119]), (2, 1, [9598980006, 6597925149])],
[(3, 42, [7065337065, 6104866670, 4417443181]), (2, 1, [6446801785, 4227327237])],
)
def test_nhsNumberGenerator_BAU(to_generate, random_state, expected):
assert expected == nhsNumberGenerator(to_generate, random_state=random_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

assert all(
(
nhsNumberValidator(val)
for val in nhsNumberGenerator(to_generate, random_state=random_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long, please reduce to 70

def test_nhsNumberGenerator_valueErrors(to_generate):
with pytest.raises(ValueError):
nhsNumberGenerator(to_generate)


@pytest.mark.parametrize(
"to_validate, expected", [(9598980006, True), (9598980007, False)]
"to_validate, expected", [(6771116069, True), (9598980007, False)]
)
def test_nhsNumberValidator_BAU(to_validate, expected):
assert expected == nhsNumberValidator(to_validate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line required after line 24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants