Skip to content

Conversation

@trivialfis
Copy link
Member

Don't merge. I'm not sure if we need to support it at the moment, as arrow-backed data is still experimental in Pandas, and the documentation on how to create a categorical feature is limited.

@trivialfis trivialfis changed the title [wip] Support arrow-backed pandas categorical columns. @trivialfis [wip] Support arrow-backed pandas categorical columns. Nov 10, 2025
Comment on lines +352 to +356
c_typ = pa.DictionaryArray.from_arrays(
pa.array([0, 1, 2]),
pa.array(["cdef", "abc", "def"], type=pa.large_utf8()),
)
c_ser = pd.Series(c_typ, dtype=pd.ArrowDtype(c_typ.type))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When constructing c_ser from c_typ, I would suggest doing something like

c_array = pd.arrays.ArrowExtensionArray(c_typ)
c_ser = pd.Series(c_array)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. I need to create a new test, this one doesn't work, as there's no cat attribute for arrow dictionary-backed columns (the existing test relies on this interface).

@mroeschke
Copy link

the documentation on how to create a categorical feature is limited

Yeah currently there isn't great support for Arrow-based categorical data in pandas. I think pandas should be able to hold the data and some common APIs across all types may be supported, but categorical specific APIs are not currently supported

@trivialfis
Copy link
Member Author

but categorical specific APIs are not currently supported

Thank you for sharing, noted. I will watch for pandas' future updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants