Skip to content

Conversation

yozachar
Copy link
Collaborator

@yozachar yozachar commented Apr 4, 2024

No description provided.

@yozachar yozachar added the maintenance PR: Alters existing source code label Apr 4, 2024
@yozachar yozachar self-assigned this Apr 4, 2024
@yozachar yozachar merged commit f89ddc2 into python-validators:master Apr 4, 2024
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this pull request Apr 26, 2024
https://build.opensuse.org/request/show/1170193
by user mia + anag+factory
- Update to 0.28.1
  * fix: reduce memory footprint when loading TLDs
    gh#python-validators/validators#362
  * fix: rfc cases in the domain validator
    gh#python-validators/validators#367
  * chore: documentation maintenance
    gh#python-validators/validators#368
@salty-horse
Copy link
Contributor

This significantly slows down TLD lookup. Opening a file and scanning it for every email validation is very inefficient. Is the memory footprint that much of a concern?

Would you be open to changing it back to something like this, which is 10 times faster?

_iana_tld_set = None

def _iana_tld():
    global _iana_tld_set
    if _iana_tld_set:
        return _iana_tld_set

    with Path(__file__).parent.joinpath("_tld.txt").open() as tld_f:
        _ = next(tld_f)
        _iana_tld_set = {line.strip() for line in tld_f}
    return _iana_tld_set

@yozachar
Copy link
Collaborator Author

Opening a file and scanning it for every email validation is very inefficient.

That's true for repeated validations.

Is the memory footprint that much of a concern?

Yes, if the file is too, large and/or, system memory is insufficient.


What about a load_iana_tld() method?

It will load and store the TLDs once. If that method isn't called, it'll lookup the file every time. Associate that method with a dataclass, instead of using global variables.

A PR is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance PR: Alters existing source code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants