-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
FIX: itemsize wrong for date32[day][pyarrow] dtype #57948 #62657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FIX: itemsize wrong for date32[day][pyarrow] dtype #57948 #62657
Conversation
How to address test fails not due to commit code changes? Will be glad to receive some guidance. |
@jbrockmendel could you provide some insights on this issue? My local CI pipeline has passed all tests with the last commit but not on the pandas repo. |
Those failures are unrelated and affecting all PRs at the moment. We all need to wait a little bit for an upstream fix. |
("decimal256", 32), | ||
], | ||
) | ||
def test_arrow_dtype_itemsize_fixed_width(pa, type_name, expected_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this test to pandas/tests/extension/test_arrow.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only this particular test test_arrow_dtype_itemsize_fixed_width
or all 3? I guess all 3 as they test the same functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the new tests in this file
if pa.types.is_boolean(self.pyarrow_dtype): | ||
return self.numpy_dtype.itemsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this outside the try except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Do you think it would be better to also catch NotImplementedError
in the except block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch NotImplementedError in the except block?
Yes or experiment what exceptions .bit_width
could raise on the pyarrow side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done the corections and checked running the CI routine on my fork, works as expected.
Waiting for the CI routine to be fixed on this repo.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Fixes #57948
Summary
ArrowDtype.itemsize
was incorrectly returning8
bytes fordate32[day]
and otherfixed-width
PyArrow types because it always fell back tonumpy_dtype.itemsize
. This PR uses PyArrow'sbit_width
forfixed-width
types and gracefully falls back tonumpy
forvariable-width
types andbooleans
.Changes
ArrowDtype.itemsize
to usepyarrow_dtype.bit_width
when availableExample
Before:
ArrowDtype(pa.date32()).itemsize == 8
After:
ArrowDtype(pa.date32()).itemsize == 4
Local Environment
NOTE : If additional information required on the local environment, would be happy to provide.