Skip to content

Conversation

caseyclements
Copy link
Contributor

@caseyclements caseyclements commented Oct 15, 2025

Issue Key

Summary

The existing API for bson.binary.BinaryVector requires that the vector of numbers be a list of either floats or integers. This extends that to allow one to use NumPy Arrays, which are the most common form of vectors used in Python.

Changes in this PR

  1. We extend the types of the data member of the bson.binary.BinaryVector to include np.ndarray:
class BinaryVector:
    """Vector of numbers along with metadata for binary interoperability.
    .. versionadded:: 4.10
    """

    __slots__ = ("data", "dtype", "padding")

    def __init__(
        self,
        data: Union[Sequence[float | int], np.ndarray],
        dtype: BinaryVectorDtype,
        padding: int = 0,
    ):
  1. Binary.from_vector now accepts this input, and uses NumPy's tobytes() instead of struct.pack as is required for lists.

  2. A new instance method, as_numpy_vector was added. This was chosen so that we maintain backwards-compatibility. There was also concern about needlessly introducing Numpy as a dependency.

Testing Plan

New tests are found in test/test_bson.py.

Checklist

Checklist for Author

  • Did you update the changelog (if necessary)?
  • Is the intention of the code captured in relevant tests?
  • If there are new TODOs, has a related JIRA ticket been created?

Checklist for Reviewer @ShaneHarvey @blink1073 @Jibola

  • Does the title of the PR reference a JIRA Ticket?
  • Do you fully understand the implementation? (Would you be comfortable explaining how this code works to someone else?)
  • Have you checked for spelling & grammar errors?
  • Is all relevant documentation (README or docstring) updated?

Focus Areas for Reviewer (optional)

@caseyclements caseyclements requested a review from a team as a code owner October 15, 2025 20:07
@caseyclements caseyclements changed the title INTPYTHON-5355 Addition of API to move to and from NumPy ndarrays to BinaryVector INTPYTHON-5355 Addition of API to move to and from NumPy ndarrays and BSON BinaryVectors Oct 15, 2025
@caseyclements caseyclements changed the title INTPYTHON-5355 Addition of API to move to and from NumPy ndarrays and BSON BinaryVectors PYTHON-5355 Addition of API to move to and from NumPy ndarrays and BSON BinaryVectors Oct 16, 2025
bson/binary.py Outdated

_NUMPY_AVAILABLE = True
except ImportError:
np = None # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC importing numpy is very very slow. In that case we should not even attempt to import it by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to lazy imports, and used importlib.util.find_spec("numpy") as skip condition in test_bson.py.

"""Subtype of this binary data."""
return self.__subtype

def as_numpy_vector(self) -> BinaryVector:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a new method or a new argument to the existing as_vector method? Like binary.as_vector(numpy=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to this as alternative to additional function as_numpy_vector. What do you think, @blink1073 ?

justfile Outdated

# Commonly used command segments.
typing_run := "uv run --group typing --extra aws --extra encryption --extra ocsp --extra snappy --extra test --extra zstd"
typing_run := "uv run --group typing --extra aws --extra encryption --extra numpy --extra ocsp --extra snappy --extra test --extra zstd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new extra we could use --with numpy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great suggestion. I had to change invocations from uv run mypy . (or pytest) to be done ad modules uv run python -m mypy . but it works. I removed the extra from pyproject and requirements/.

elif dtype == BinaryVectorDtype.FLOAT32:
data = np.frombuffer(self[2:], dtype="float32")
elif dtype == BinaryVectorDtype.PACKED_BIT:
data = np.frombuffer(self[2:], dtype="uint8")
Copy link
Member

@ShaneHarvey ShaneHarvey Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we following the rules of the spec for validating PACKED_BIT here (eg the validation in as_vector)?

@Jibola Jibola requested review from Jibola, ShaneHarvey and blink1073 and removed request for aclark4life October 16, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants