-
Notifications
You must be signed in to change notification settings - Fork 13
Nhs number generator improvements #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Hi @Khoross , thank you for taking the time to create a pull request to the Codon repository. I also apologise for the delay in responding to your pull request, things have been very busy of late. Unfortunatly, I am unable to merge your NHS number generator into Codon due to the build failing based on this error...
Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Khoross,
Firstly, I'd like to apologies for the delay in reviewing your merge request. As you know, there hasn't been a lot of time to focus on Codon due to the impact Covid19 on current work volume.
I have reviewed your changes and it all looks good, however, there are a couple of changes required so that the code is adhering to PEP8. The issues that I found were that some of the lines were to long (must be <= 70) and that there were missing blank lines at the end of the nhsNumber.py and nhsNumber_test.py.
I was able to verify this using a Python linter known as Flake, this will automatically check the code against pep8 and identify any problems. This might be worth exploring in the future and can easily be installed using Conda or Pip package managers.
When you get chance, please could you update your branch against Codon's master branch and make the requested changes.
""" | ||
|
||
if random_state: | ||
random.seed(random_state) | ||
if not isinstance(to_generate, int): | ||
raise ValueError("Please input a positive integer to generate numbers.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
if not isinstance(to_generate, int): | ||
raise ValueError("Please input a positive integer to generate numbers.") | ||
if to_generate > 1000000: | ||
raise ValueError("More than one million values requested") | ||
if to_generate < 0: | ||
raise ValueError("Please input a postitive integer to generate numbers.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
# First, generate 8 digits, using numpy.randint (the middle 8 digits) | ||
# Second, generate the check digit portions for each block of 8 digits | ||
# Third, generate 1 digit (the 1st digit) between 1 and 8 | ||
# increase this value by 1 if it is at or above the value which would cause a check digit of 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
# Second, generate the check digit portions for each block of 8 digits | ||
# Third, generate 1 digit (the 1st digit) between 1 and 8 | ||
# increase this value by 1 if it is at or above the value which would cause a check digit of 10 | ||
# be aware that this will not produce a fully uniform distribution over NHS numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
# Third, generate 1 digit (the 1st digit) between 1 and 8 | ||
# increase this value by 1 if it is at or above the value which would cause a check digit of 10 | ||
# be aware that this will not produce a fully uniform distribution over NHS numbers | ||
# the distribution will not produce any NHS number with a leading digit (or check digit) of 1 where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
|
||
result = np.dot(result_digits, 10 ** np.arange(9, -1, -1, dtype=np.int64)) | ||
|
||
return [int(val) for val in result] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line required after line 101
@@ -5,20 +5,32 @@ | |||
|
|||
@pytest.mark.parametrize( | |||
"to_generate, random_state, expected", | |||
[(3, 42, [8429141456, 2625792787, 8235363119]), (2, 1, [9598980006, 6597925149])], | |||
[(3, 42, [7065337065, 6104866670, 4417443181]), (2, 1, [6446801785, 4227327237])], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
@@ -5,20 +5,32 @@ | |||
|
|||
@pytest.mark.parametrize( | |||
"to_generate, random_state, expected", | |||
[(3, 42, [8429141456, 2625792787, 8235363119]), (2, 1, [9598980006, 6597925149])], | |||
[(3, 42, [7065337065, 6104866670, 4417443181]), (2, 1, [6446801785, 4227327237])], | |||
) | |||
def test_nhsNumberGenerator_BAU(to_generate, random_state, expected): | |||
assert expected == nhsNumberGenerator(to_generate, random_state=random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
assert all( | ||
( | ||
nhsNumberValidator(val) | ||
for val in nhsNumberGenerator(to_generate, random_state=random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is too long, please reduce to 70
def test_nhsNumberGenerator_valueErrors(to_generate): | ||
with pytest.raises(ValueError): | ||
nhsNumberGenerator(to_generate) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"to_validate, expected", [(9598980006, True), (9598980007, False)] | ||
"to_validate, expected", [(6771116069, True), (9598980007, False)] | ||
) | ||
def test_nhsNumberValidator_BAU(to_validate, expected): | ||
assert expected == nhsNumberValidator(to_validate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line required after line 24
What kind of change does this Pull request introduce?
Replacement for the NHS number generator algorithm
What is the current behaviour?
Currently numbers are generated via a Monte Carlo method, generating random numbers until sufficiently many valid numbers have been made. This takes on average 11 attempts per number requested. This is also done sequentially rather than in a parallel or vectorized method.
What is the new behaviour?
An algorithm based on modular arithmetic to generate known valid numbers. This works as follows:
The central 8 digits of the NHS number are generated, uniformly from 0 to 9
The contribution of the central digits to the check digit is calculated
A random number from 1 to 8 is generated for the leading digit
We calculate the leading digit value for which there would be an invalid check digit
If the random number for the leading digit is at or above the calculated value, then we increase it by 1
We finally calculate the (guaranteed to be valid) check digit
This is all performed using numpy, to allow for vectorized calculations, greatly speeding up processing time if a large number are required.
Does this Pull Request introduce a breaking change?
The new algorithm interacts with the random number generator in a different way, and so produces different results from the same seed. This cannot be avoided.