Skip to content

Conversation

mraabhijit
Copy link

@mraabhijit mraabhijit commented Oct 11, 2025

Fixes #57948

Summary

ArrowDtype.itemsize was incorrectly returning 8 bytes for date32[day] and other fixed-width PyArrow types because it always fell back to numpy_dtype.itemsize. This PR uses PyArrow's bit_width for fixed-width types and gracefully falls back to numpy for variable-width types and booleans.

Changes

  • Modified ArrowDtype.itemsize to use pyarrow_dtype.bit_width when available
  • Added comprehensive regression test covering the fix

Example

Before: ArrowDtype(pa.date32()).itemsize == 8
After: ArrowDtype(pa.date32()).itemsize == 4

Local Environment
  • OS: Windows-10-10.0.19045-SP0
  • arrow==1.3.0
  • python==3.11.13
  • numpy==2.3.3
  • pyarrow==21.0.0
  • pandas==3.0.0.dev0+2523.g1863adb252
  • pytest==8.4.2

NOTE : If additional information required on the local environment, would be happy to provide.

@mraabhijit
Copy link
Author

How to address test fails not due to commit code changes? Will be glad to receive some guidance.

@mroeschke mroeschke added the Arrow pyarrow functionality label Oct 13, 2025
@mraabhijit
Copy link
Author

@jbrockmendel could you provide some insights on this issue? My local CI pipeline has passed all tests with the last commit but not on the pandas repo.

@jbrockmendel
Copy link
Member

Those failures are unrelated and affecting all PRs at the moment. We all need to wait a little bit for an upstream fix.

("decimal256", 32),
],
)
def test_arrow_dtype_itemsize_fixed_width(pa, type_name, expected_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this test to pandas/tests/extension/test_arrow.py?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this particular test test_arrow_dtype_itemsize_fixed_width or all 3? I guess all 3 as they test the same functionality?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the new tests in this file

Comment on lines +2334 to +2335
if pa.types.is_boolean(self.pyarrow_dtype):
return self.numpy_dtype.itemsize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this outside the try except?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Do you think it would be better to also catch NotImplementedError in the except block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch NotImplementedError in the except block?

Yes or experiment what exceptions .bit_width could raise on the pyarrow side

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done the corections and checked running the CI routine on my fork, works as expected.
Waiting for the CI routine to be fixed on this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: itemsize wrong for date32[day][pyarrow] dtype?

3 participants