-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-5355 Addition of API to move to and from NumPy ndarrays and BSON BinaryVectors #2590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
PYTHON-5355 Addition of API to move to and from NumPy ndarrays and BSON BinaryVectors #2590
Conversation
bson/binary.py
Outdated
|
||
_NUMPY_AVAILABLE = True | ||
except ImportError: | ||
np = None # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC importing numpy is very very slow. In that case we should not even attempt to import it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed to lazy imports, and used importlib.util.find_spec("numpy")
as skip condition in test_bson.py
.
"""Subtype of this binary data.""" | ||
return self.__subtype | ||
|
||
def as_numpy_vector(self) -> BinaryVector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a new method or a new argument to the existing as_vector method? Like binary.as_vector(numpy=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to this as alternative to additional function as_numpy_vector
. What do you think, @blink1073 ?
justfile
Outdated
|
||
# Commonly used command segments. | ||
typing_run := "uv run --group typing --extra aws --extra encryption --extra ocsp --extra snappy --extra test --extra zstd" | ||
typing_run := "uv run --group typing --extra aws --extra encryption --extra numpy --extra ocsp --extra snappy --extra test --extra zstd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a new extra we could use --with numpy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a great suggestion. I had to change invocations from uv run mypy .
(or pytest) to be done ad modules uv run python -m mypy .
but it works. I removed the extra from pyproject and requirements/.
elif dtype == BinaryVectorDtype.FLOAT32: | ||
data = np.frombuffer(self[2:], dtype="float32") | ||
elif dtype == BinaryVectorDtype.PACKED_BIT: | ||
data = np.frombuffer(self[2:], dtype="uint8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we following the rules of the spec for validating PACKED_BIT here (eg the validation in as_vector)?
Issue Key
Summary
The existing API for
bson.binary.BinaryVector
requires that the vector of numbers be a list of either floats or integers. This extends that to allow one to use NumPy Arrays, which are the most common form of vectors used in Python.Changes in this PR
data
member of thebson.binary.BinaryVector
to include np.ndarray:Binary.from_vector
now accepts this input, and uses NumPy'stobytes()
instead ofstruct.pack
as is required for lists.A new instance method,
as_numpy_vector
was added. This was chosen so that we maintain backwards-compatibility. There was also concern about needlessly introducing Numpy as a dependency.Testing Plan
New tests are found in
test/test_bson.py
.Checklist
Checklist for Author
Checklist for Reviewer @ShaneHarvey @blink1073 @Jibola
Focus Areas for Reviewer (optional)