You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[API/REF]: SparseArray is an ExtensionArray (#22325)
Makes SparseArray an ExtensionArray.
* Fixed DataFrame.__setitem__ for updating to sparse.
Closes#22367
* Fixed Series[sparse].to_sparse
Closes#22389Closes#21978Closes#19506Closes#22835
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v0.24.0.txt
+46-7Lines changed: 46 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -380,6 +380,37 @@ is the case with :attr:`Period.end_time`, for example
380
380
381
381
p.end_time
382
382
383
+
.. _whatsnew_0240.api_breaking.sparse_values:
384
+
385
+
Sparse Data Structure Refactor
386
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
387
+
388
+
``SparseArray``, the array backing ``SparseSeries`` and the columns in a ``SparseDataFrame``,
389
+
is now an extension array (:issue:`21978`, :issue:`19056`, :issue:`22835`).
390
+
To conform to this interface and for consistency with the rest of pandas, some API breaking
391
+
changes were made:
392
+
393
+
- ``SparseArray`` is no longer a subclass of :class:`numpy.ndarray`. To convert a SparseArray to a NumPy array, use :meth:`numpy.asarray`.
394
+
- ``SparseArray.dtype`` and ``SparseSeries.dtype`` are now instances of :class:`SparseDtype`, rather than ``np.dtype``. Access the underlying dtype with ``SparseDtype.subtype``.
395
+
- :meth:`numpy.asarray(sparse_array)` now returns a dense array with all the values, not just the non-fill-value values (:issue:`14167`)
396
+
- ``SparseArray.take`` now matches the API of :meth:`pandas.api.extensions.ExtensionArray.take` (:issue:`19506`):
397
+
398
+
* The default value of ``allow_fill`` has changed from ``False`` to ``True``.
399
+
* The ``out`` and ``mode`` parameters are now longer accepted (previously, this raised if they were specified).
400
+
* Passing a scalar for ``indices`` is no longer allowed.
401
+
402
+
- The result of concatenating a mix of sparse and dense Series is a Series with sparse values, rather than a ``SparseSeries``.
403
+
- ``SparseDataFrame.combine`` and ``DataFrame.combine_first`` no longer supports combining a sparse column with a dense column while preserving the sparse subtype. The result will be an object-dtype SparseArray.
404
+
- Setting :attr:`SparseArray.fill_value` to a fill value with a different dtype is now allowed.
405
+
406
+
407
+
Some new warnings are issued for operations that require or are likely to materialize a large dense array:
408
+
409
+
- A :class:`errors.PerformanceWarning` is issued when using fillna with a ``method``, as a dense array is constructed to create the filled array. Filling with a ``value`` is the efficient way to fill a sparse array.
410
+
- A :class:`errors.PerformanceWarning` is now issued when concatenating sparse Series with differing fill values. The fill value from the first sparse array continues to be used.
411
+
412
+
In addition to these API breaking changes, many :ref:`performance improvements and bug fixes have been made <whatsnew_0240.bug_fixes.sparse>`.
Raise ValueError in ``DataFrame.to_dict(orient='index')``
@@ -573,6 +604,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
573
604
- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`)
574
605
- Series backed by an ``ExtensionArray`` now work with :func:`util.hash_pandas_object` (:issue:`23066`)
575
606
- Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`)
607
+
- :func:`ExtensionArray.isna` is allowed to return an ``ExtensionArray`` (:issue:`22325`).
576
608
- Support for reduction operations such as ``sum``, ``mean`` via opt-in base class method override (:issue:`22762`)
577
609
578
610
.. _whatsnew_0240.api.incompatibilities:
@@ -655,6 +687,7 @@ Other API Changes
655
687
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
656
688
- :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`)
657
689
- :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`)
690
+
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
658
691
659
692
.. _whatsnew_0240.deprecations:
660
693
@@ -896,13 +929,6 @@ Groupby/Resample/Rolling
896
929
- :func:`RollingGroupby.agg` and :func:`ExpandingGroupby.agg` now support multiple aggregation functions as parameters (:issue:`15072`)
897
930
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` when resampling by a weekly offset (``'W'``) across a DST transition (:issue:`9119`, :issue:`21459`)
898
931
899
-
Sparse
900
-
^^^^^^
901
-
902
-
-
903
-
-
904
-
-
905
-
906
932
Reshaping
907
933
^^^^^^^^^
908
934
@@ -921,6 +947,19 @@ Reshaping
921
947
- Bug in :func:`merge_asof` when merging on float values within defined tolerance (:issue:`22981`)
922
948
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue`22796`)
923
949
950
+
.. _whatsnew_0240.bug_fixes.sparse:
951
+
952
+
Sparse
953
+
^^^^^^
954
+
955
+
- Updating a boolean, datetime, or timedelta column to be Sparse now works (:issue:`22367`)
956
+
- Bug in :meth:`Series.to_sparse` with Series already holding sparse data not constructing properly (:issue:`22389`)
957
+
- Providing a ``sparse_index`` to the SparseArray constructor no longer defaults the na-value to ``np.nan`` for all dtypes. The correct na_value for ``data.dtype`` is now used.
958
+
- Bug in ``SparseArray.nbytes`` under-reporting its memory usage by not including the size of its sparse index.
959
+
- Improved performance of :meth:`Series.shift` for non-NA ``fill_value``, as values are no longer converted to a dense array.
960
+
- Bug in ``DataFrame.groupby`` not including ``fill_value`` in the groups for non-NA ``fill_value`` when grouping by a sparse column (:issue:`5078`)
961
+
- Bug in unary inversion operator (``~``) on a ``SparseSeries`` with boolean values. The performance of this has also been improved (:issue:`22835`)
0 commit comments