You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ENH: Add new implementation of DataFrame.stack (#53921)
* DEPR: Add new implementation of DataFrame.stack and deprecate old
* Merge cleanup
* Revert filterwarnings in conf.py
* Merge fixup
* Rename inner function
* v3->future_stack; other refinements
* Fixup docstring
* Docstring fixup
@@ -128,6 +128,45 @@ Also, note that :meth:`Categorical.map` implicitly has had its ``na_action`` set
128
128
This has been deprecated and will :meth:`Categorical.map` in the future change the default
129
129
to ``na_action=None``, like for all the other array types.
130
130
131
+
.. _whatsnew_210.enhancements.new_stack:
132
+
133
+
New implementation of :meth:`DataFrame.stack`
134
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135
+
136
+
pandas has reimplemented :meth:`DataFrame.stack`. To use the new implementation, pass the argument ``future_stack=True``. This will become the only option in pandas 3.0.
137
+
138
+
The previous implementation had two main behavioral downsides.
139
+
140
+
1. The previous implementation would unnecessarily introduce NA values into the result. The user could have NA values automatically removed by passing ``dropna=True`` (the default), but doing this could also remove NA values from the result that existed in the input. See the examples below.
141
+
2. The previous implementation with ``sort=True`` (the default) would sometimes sort part of the resulting index, and sometimes not. If the input's columns are *not* a :class:`MultiIndex`, then the resulting index would never be sorted. If the columns are a :class:`MultiIndex`, then in most cases the level(s) in the resulting index that come from stacking the column level(s) would be sorted. In rare cases such level(s) would be sorted in a non-standard order, depending on how the columns were created.
142
+
143
+
The new implementation (``future_stack=True``) will no longer unnecessarily introduce NA values when stacking multiple levels and will never sort. As such, the arguments ``dropna`` and ``sort`` are not utilized and must remain unspecified when using ``future_stack=True``. These arguments will be removed in the next major release.
In the previous version (``future_stack=False``), the default of ``dropna=True`` would remove unnecessarily introduced NA values but still coerce the dtype to ``float64`` in the process. In the new version, no NAs are introduced and so there is no coercion of the dtype.
152
+
153
+
.. ipython:: python
154
+
:okwarning:
155
+
156
+
df.stack([0, 1], future_stack=False, dropna=True)
157
+
df.stack([0, 1], future_stack=True)
158
+
159
+
If the input contains NA values, the previous version would drop those as well with ``dropna=True`` or introduce new NA values with ``dropna=False``. The new version persists all values from the input.
0 commit comments