-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
ENH: pd.NamedAgg forwards *args and **kwargs to aggfunc #62729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
|
||
|
||
@set_module("pandas") | ||
class NamedAgg(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems too breaking. Previously, users could access NamedAgg.column
after creation, but not if we inherit from tuple. Can we use a dataclass here instead:
@dataclasses.dataclass
class NamedAgg:
column: Hashable
aggfunc: AggScalar
args: tuple = ()
kwargs: dict = dataclasses.field(default_factory=dict)
def __init__(self, column: Hashable, aggfunc: AggScalar, *args, **kwargs) -> None:
self.column = column
self.aggfunc = aggfunc
self.args = args
self.kwargs = kwargs
def __getitem__(self, key: int):
if key == 0:
return self.column
elif key == 1:
return self.aggfunc
elif key == 2:
return self.args
elif key == 3:
return self.kwargs
raise IndexError("index out of range")
We could then possibly deprecate __getitem__
access.
>>> agg_1 = pd.NamedAgg(column=1, aggfunc=lambda x: np.mean(x)) | ||
>>> df.groupby("key").agg(result_a=agg_a, result_1=agg_1) | ||
result_a result_1 | ||
>>> agg_b = pd.NamedAgg(column="b", aggfunc=lambda x: x.mean()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point here is to demonstrate that you can used a named tuple on columns that are not strings.
return original_func(series, *final_args, **final_kwargs) | ||
|
||
wrapped._is_wrapped = True # type: ignore[attr-defined] | ||
aggfunc = wrapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line with the above, this changes the aggfunc
which is a public attribute. Instead, I think we should utilize args/kwargs in places within pandas that accept a NamedAgg
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.