Skip to content

Hashlib usage is underspecified #27034

@DueViktor

Description

@DueViktor

Feature request

From python 3.9 hashlib introduced the usedforsecurity argument:

Changed in version 3.9: All hashlib constructors take a keyword-only argument usedforsecurity with default value True. A false value allows the use of insecure and blocked hashing algorithms in restricted environments. False indicates that the hashing algorithm is not used in a security context, e.g. as a non-cryptographic one-way compression function.

transformers use hashing in many cases where the purpose is indeed not for security purposes. This should be specifed in the code.

Motivation

Transformers use MD5 from hashlib, which is not a secure algorithm, but are not specifying that it is for other purposes than security. This is causing issues for organisations following certain security standard. FIPS compliance could be an example.

Your contribution

I will attach a PR specifying the usage of hashlib algorithms. Since usedforsecurity is only specified from 3.9+ and transformers support 3.6+, I'll add a functionality to detect python version and change kwargs based on that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions