Skip to content

Commit e87248e

Browse files
authored
API: Adding pandas.api.typing.aliases and docs (#61735)
1 parent b2a6d74 commit e87248e

File tree

10 files changed

+339
-9
lines changed

10 files changed

+339
-9
lines changed

doc/source/development/contributing_codebase.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@ With custom types and inference this is not always possible so exceptions are ma
214214
pandas-specific types
215215
~~~~~~~~~~~~~~~~~~~~~
216216

217-
Commonly used types specific to pandas will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/main/pandas/_typing.py>`_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
217+
Commonly used types specific to pandas will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/main/pandas/_typing.py>`__ and you should use these where applicable. This module is private and is meant for pandas development.
218+
Types that are meant for user consumption should be exposed in `pandas.api.typing.aliases <https://github.com/pandas-dev/pandas/blob/main/pandas/api/typing/aliases.py>`__ and ideally added to the `pandas-stubs <https://github.com/pandas-dev/pandas-stubs>`__ project.
218219

219220
For example, quite a few functions in pandas accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a ``numpy.dtype`` like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module
220221

doc/source/reference/aliases.rst

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
{{ header }}
2+
3+
.. _api.typing.aliases:
4+
5+
======================================
6+
pandas typing aliases
7+
======================================
8+
9+
**************
10+
Typing aliases
11+
**************
12+
13+
.. currentmodule:: pandas.api.typing.aliases
14+
15+
The typing declarations in ``pandas/_typing.py`` are considered private, and used
16+
by pandas developers for type checking of the pandas code base. For users, it is
17+
highly recommended to use the ``pandas-stubs`` package that represents the officially
18+
supported type declarations for users of pandas.
19+
They are documented here for users who wish to use these declarations in their
20+
own python code that calls pandas or expects certain results.
21+
22+
.. warning::
23+
24+
Note that the definitions and use cases of these aliases are subject to change without notice in any major, minor, or patch release of pandas.
25+
26+
Each of these aliases listed in the table below can be found by importing them from :py:mod:`pandas.api.typing.aliases`.
27+
28+
==================================== ================================================================
29+
Alias Meaning
30+
==================================== ================================================================
31+
:py:type:`AggFuncType` Type of functions that can be passed to :meth:`agg` methods
32+
:py:type:`AlignJoin` Argument type for ``join`` in :meth:`DataFrame.join`
33+
:py:type:`AnyAll` Argument type for ``how`` in :meth:`dropna`
34+
:py:type:`AnyArrayLike` Used to represent :class:`ExtensionArray`, ``numpy`` arrays, :class:`Index` and :class:`Series`
35+
:py:type:`ArrayLike` Used to represent :class:`ExtensionArray`, ``numpy`` arrays
36+
:py:type:`AstypeArg` Argument type in :meth:`astype`
37+
:py:type:`Axes` :py:type:`AnyArrayLike` plus sequences (not strings) and ``range``
38+
:py:type:`Axis` Argument type for ``axis`` in many methods
39+
:py:type:`CSVEngine` Argument type for ``engine`` in :meth:`DataFrame.read_csv`
40+
:py:type:`ColspaceArgType` Argument type for ``colspace`` in :meth:`DataFrame.to_html`
41+
:py:type:`CompressionOptions` Argument type for ``compression`` in all I/O output methods except :meth:`DataFrame.to_parquet`
42+
:py:type:`CorrelationMethod` Argument type for ``correlation`` in :meth:`corr`
43+
:py:type:`DropKeep` Argument type for ``keep`` in :meth:`drop_duplicates`
44+
:py:type:`Dtype` Types as objects that can be used to specify dtypes
45+
:py:type:`DtypeArg` Argument type for ``dtype`` in various methods
46+
:py:type:`DtypeBackend` Argument type for ``dtype_backend`` in various methods
47+
:py:type:`DtypeObj` Numpy dtypes and Extension dtypes
48+
:py:type:`ExcelWriterIfSheetExists` Argument type for ``if_sheet_exists`` in :class:`ExcelWriter`
49+
:py:type:`ExcelWriterMergeCells` Argument type for ``merge_cells`` in :meth:`to_excel`
50+
:py:type:`FilePath` Type of paths for files for I/O methods
51+
:py:type:`FillnaOptions` Argument type for ``method`` in various methods where NA values are filled
52+
:py:type:`FloatFormatType` Argument type for ``float_format`` in :meth:`to_string`
53+
:py:type:`FormattersType` Argument type for ``formatters`` in :meth:`to_string`
54+
:py:type:`FromDictOrient` Argument type for ``orient`` in :meth:`DataFrame.from_dict`
55+
:py:type:`HTMLFlavors` Argument type for ``flavor`` in :meth:`pandas.read_html`
56+
:py:type:`IgnoreRaise` Argument type for ``errors`` in multiple methods
57+
:py:type:`IndexLabel` Argument type for ``level`` in multiple methods
58+
:py:type:`InterpolateOptions` Argument type for ``interpolate`` in :meth:`interpolate`
59+
:py:type:`JSONEngine` Argument type for ``engine`` in :meth:`read_json`
60+
:py:type:`JSONSerializable` Argument type for the return type of a callable for argument ``default_handler`` in :meth:`to_json`
61+
:py:type:`JoinHow` Argument type for ``how`` in :meth:`pandas.merge_ordered` and for ``join`` in :meth:`Series.align`
62+
:py:type:`JoinValidate` Argument type for ``validate`` in :meth:`DataFrame.join`
63+
:py:type:`MergeHow` Argument type for ``how`` in :meth:`merge`
64+
:py:type:`MergeValidate` Argument type for ``validate`` in :meth:`merge`
65+
:py:type:`NaPosition` Argument type for ``na_position`` in :meth:`sort_index` and :meth:`sort_values`
66+
:py:type:`NsmallestNlargestKeep` Argument type for ``keep`` in :meth:`nlargest` and :meth:`nsmallest`
67+
:py:type:`OpenFileErrors` Argument type for ``errors`` in :meth:`to_hdf` and :meth:`to_csv`
68+
:py:type:`Ordered` Return type for :py:attr:`ordered`` in :class:`CategoricalDtype` and :class:`Categorical`
69+
:py:type:`ParquetCompressionOptions` Argument type for ``compression`` in :meth:`DataFrame.to_parquet`
70+
:py:type:`QuantileInterpolation` Argument type for ``interpolation`` in :meth:`quantile`
71+
:py:type:`ReadBuffer` Additional argument type corresponding to buffers for various file reading methods
72+
:py:type:`ReadCsvBuffer` Additional argument type corresponding to buffers for :meth:`pandas.read_csv`
73+
:py:type:`ReadPickleBuffer` Additional argument type corresponding to buffers for :meth:`pandas.read_pickle`
74+
:py:type:`ReindexMethod` Argument type for ``reindex`` in :meth:`reindex`
75+
:py:type:`Scalar` Types that can be stored in :class:`Series` with non-object dtype
76+
:py:type:`SequenceNotStr` Used for arguments that require sequences, but not plain strings
77+
:py:type:`SliceType` Argument types for ``start`` and ``end`` in :meth:`Index.slice_locs`
78+
:py:type:`SortKind` Argument type for ``kind`` in :meth:`sort_index` and :meth:`sort_values`
79+
:py:type:`StorageOptions` Argument type for ``storage_options`` in various file output methods
80+
:py:type:`Suffixes` Argument type for ``suffixes`` in :meth:`merge`, :meth:`compare` and :meth:`merge_ordered`
81+
:py:type:`TakeIndexer` Argument type for ``indexer`` and ``indices`` in :meth:`take`
82+
:py:type:`TimeAmbiguous` Argument type for ``ambiguous`` in time operations
83+
:py:type:`TimeGrouperOrigin` Argument type for ``origin`` in :meth:`resample` and :class:`TimeGrouper`
84+
:py:type:`TimeNonexistent` Argument type for ``nonexistent`` in time operations
85+
:py:type:`TimeUnit` Time unit argument and return type for :py:attr:`unit`, arguments ``unit`` and ``date_unit``
86+
:py:type:`TimedeltaConvertibleTypes` Argument type for ``offset`` in :meth:`resample`, ``halflife`` in :meth:`ewm` and ``start`` and ``end`` in :meth:`pandas.timedelta_range`
87+
:py:type:`TimestampConvertibleTypes` Argument type for ``origin`` in :meth:`resample` and :meth:`pandas.to_datetime`
88+
:py:type:`ToStataByteorder` Argument type for ``byteorder`` in :meth:`DataFrame.to_stata`
89+
:py:type:`ToTimestampHow` Argument type for ``how`` in :meth:`to_timestamp` and ``convention`` in :meth:`resample`
90+
:py:type:`UpdateJoin` Argument type for ``join`` in :meth:`DataFrame.update`
91+
:py:type:`UsecolsArgType` Argument type for ``usecols`` in :meth:`pandas.read_clipboard`, :meth:`pandas.read_csv` and :meth:`pandas.read_excel`
92+
:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank`` in rolling and expanding window operations
93+
:py:type:`WriteBuffer` Additional argument type corresponding to buffers for various file output methods
94+
:py:type:`WriteExcelBuffer` Additional argument type corresponding to buffers for :meth:`to_excel`
95+
:py:type:`XMLParsers` Argument type for ``parser`` in :meth:`DataFrame.to_xml` and :meth:`pandas.read_xml`
96+
==================================== ================================================================

doc/source/reference/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ to be stable.
5555
extensions
5656
testing
5757
missing_value
58+
aliases
5859

5960
.. This is to prevent warnings in the doc build. We don't want to encourage
6061
.. these methods.

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,7 @@ Other enhancements
220220
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
221221
- Improve the resulting dtypes in :meth:`DataFrame.where` and :meth:`DataFrame.mask` with :class:`ExtensionDtype` ``other`` (:issue:`62038`)
222222
- Improved deprecation message for offset aliases (:issue:`60820`)
223+
- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
223224
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
224225
- Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`)
225226
- Support passing a :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)

pandas/_typing.py

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,8 +133,27 @@ def __reversed__(self) -> Iterator[_T_co]: ...
133133

134134
PythonScalar: TypeAlias = str | float | bool
135135
DatetimeLikeScalar: TypeAlias = Union["Period", "Timestamp", "Timedelta"]
136-
PandasScalar: TypeAlias = Union["Period", "Timestamp", "Timedelta", "Interval"]
137-
Scalar: TypeAlias = PythonScalar | PandasScalar | np.datetime64 | np.timedelta64 | date
136+
137+
# aligned with pandas-stubs - typical scalars found in Series. Explicitly leaves
138+
# out object
139+
_IndexIterScalar: TypeAlias = Union[
140+
str,
141+
bytes,
142+
date,
143+
datetime,
144+
timedelta,
145+
np.datetime64,
146+
np.timedelta64,
147+
bool,
148+
int,
149+
float,
150+
"Timestamp",
151+
"Timedelta",
152+
]
153+
Scalar: TypeAlias = Union[
154+
_IndexIterScalar, "Interval", complex, np.integer, np.floating, np.complexfloating
155+
]
156+
138157
IntStrT = TypeVar("IntStrT", bound=int | str)
139158

140159
# timestamp and timedelta convertible types
@@ -312,6 +331,9 @@ def closed(self) -> bool:
312331
CompressionOptions: TypeAlias = (
313332
Literal["infer", "gzip", "bz2", "zip", "xz", "zstd", "tar"] | CompressionDict | None
314333
)
334+
ParquetCompressionOptions: TypeAlias = (
335+
Literal["snappy", "gzip", "brotli", "lz4", "zstd"] | None
336+
)
315337

316338
# types in DataFrameFormatter
317339
FormattersType: TypeAlias = (

pandas/api/typing/aliases.py

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
from pandas._typing import (
2+
AggFuncType,
3+
AlignJoin,
4+
AnyAll,
5+
AnyArrayLike,
6+
ArrayLike,
7+
AstypeArg,
8+
Axes,
9+
Axis,
10+
ColspaceArgType,
11+
CompressionOptions,
12+
CorrelationMethod,
13+
CSVEngine,
14+
DropKeep,
15+
Dtype,
16+
DtypeArg,
17+
DtypeBackend,
18+
DtypeObj,
19+
ExcelWriterIfSheetExists,
20+
ExcelWriterMergeCells,
21+
FilePath,
22+
FillnaOptions,
23+
FloatFormatType,
24+
FormattersType,
25+
FromDictOrient,
26+
HTMLFlavors,
27+
IgnoreRaise,
28+
IndexLabel,
29+
InterpolateOptions,
30+
JoinHow,
31+
JoinValidate,
32+
JSONEngine,
33+
JSONSerializable,
34+
MergeHow,
35+
MergeValidate,
36+
NaPosition,
37+
NsmallestNlargestKeep,
38+
OpenFileErrors,
39+
Ordered,
40+
ParquetCompressionOptions,
41+
QuantileInterpolation,
42+
ReadBuffer,
43+
ReadCsvBuffer,
44+
ReadPickleBuffer,
45+
ReindexMethod,
46+
Scalar,
47+
SequenceNotStr,
48+
SliceType,
49+
SortKind,
50+
StorageOptions,
51+
Suffixes,
52+
TakeIndexer,
53+
TimeAmbiguous,
54+
TimedeltaConvertibleTypes,
55+
TimeGrouperOrigin,
56+
TimeNonexistent,
57+
TimestampConvertibleTypes,
58+
TimeUnit,
59+
ToStataByteorder,
60+
ToTimestampHow,
61+
UpdateJoin,
62+
UsecolsArgType,
63+
WindowingRankType,
64+
WriteBuffer,
65+
WriteExcelBuffer,
66+
XMLParsers,
67+
)
68+
69+
__all__ = [
70+
"AggFuncType",
71+
"AlignJoin",
72+
"AnyAll",
73+
"AnyArrayLike",
74+
"ArrayLike",
75+
"AstypeArg",
76+
"Axes",
77+
"Axis",
78+
"CSVEngine",
79+
"ColspaceArgType",
80+
"CompressionOptions",
81+
"CorrelationMethod",
82+
"DropKeep",
83+
"Dtype",
84+
"DtypeArg",
85+
"DtypeBackend",
86+
"DtypeObj",
87+
"ExcelWriterIfSheetExists",
88+
"ExcelWriterMergeCells",
89+
"FilePath",
90+
"FillnaOptions",
91+
"FloatFormatType",
92+
"FormattersType",
93+
"FromDictOrient",
94+
"HTMLFlavors",
95+
"IgnoreRaise",
96+
"IndexLabel",
97+
"InterpolateOptions",
98+
"JSONEngine",
99+
"JSONSerializable",
100+
"JoinHow",
101+
"JoinValidate",
102+
"MergeHow",
103+
"MergeValidate",
104+
"NaPosition",
105+
"NsmallestNlargestKeep",
106+
"OpenFileErrors",
107+
"Ordered",
108+
"ParquetCompressionOptions",
109+
"QuantileInterpolation",
110+
"ReadBuffer",
111+
"ReadCsvBuffer",
112+
"ReadPickleBuffer",
113+
"ReindexMethod",
114+
"Scalar",
115+
"SequenceNotStr",
116+
"SliceType",
117+
"SortKind",
118+
"StorageOptions",
119+
"Suffixes",
120+
"TakeIndexer",
121+
"TimeAmbiguous",
122+
"TimeGrouperOrigin",
123+
"TimeNonexistent",
124+
"TimeUnit",
125+
"TimedeltaConvertibleTypes",
126+
"TimestampConvertibleTypes",
127+
"ToStataByteorder",
128+
"ToTimestampHow",
129+
"UpdateJoin",
130+
"UsecolsArgType",
131+
"WindowingRankType",
132+
"WriteBuffer",
133+
"WriteExcelBuffer",
134+
"XMLParsers",
135+
]

pandas/core/dtypes/cast.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -599,7 +599,7 @@ def _maybe_promote(dtype: np.dtype, fill_value=np.nan):
599599
dtype = np.dtype(np.object_)
600600

601601
elif issubclass(dtype.type, np.integer):
602-
if not np_can_cast_scalar(fill_value, dtype): # type: ignore[arg-type]
602+
if not np_can_cast_scalar(fill_value, dtype):
603603
# upcast to prevent overflow
604604
mst = np.min_scalar_type(fill_value)
605605
dtype = np.promote_types(dtype, mst)

pandas/core/frame.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@
240240
MutableMappingT,
241241
NaPosition,
242242
NsmallestNlargestKeep,
243+
ParquetCompressionOptions,
243244
PythonFuncType,
244245
QuantileInterpolation,
245246
ReadBuffer,
@@ -2862,7 +2863,7 @@ def to_parquet(
28622863
path: None = ...,
28632864
*,
28642865
engine: Literal["auto", "pyarrow", "fastparquet"] = ...,
2865-
compression: str | None = ...,
2866+
compression: ParquetCompressionOptions = ...,
28662867
index: bool | None = ...,
28672868
partition_cols: list[str] | None = ...,
28682869
storage_options: StorageOptions = ...,
@@ -2875,7 +2876,7 @@ def to_parquet(
28752876
path: FilePath | WriteBuffer[bytes],
28762877
*,
28772878
engine: Literal["auto", "pyarrow", "fastparquet"] = ...,
2878-
compression: str | None = ...,
2879+
compression: ParquetCompressionOptions = ...,
28792880
index: bool | None = ...,
28802881
partition_cols: list[str] | None = ...,
28812882
storage_options: StorageOptions = ...,
@@ -2888,7 +2889,7 @@ def to_parquet(
28882889
path: FilePath | WriteBuffer[bytes] | None = None,
28892890
*,
28902891
engine: Literal["auto", "pyarrow", "fastparquet"] = "auto",
2891-
compression: str | None = "snappy",
2892+
compression: ParquetCompressionOptions = "snappy",
28922893
index: bool | None = None,
28932894
partition_cols: list[str] | None = None,
28942895
storage_options: StorageOptions | None = None,

pandas/io/parquet.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
from pandas._typing import (
4444
DtypeBackend,
4545
FilePath,
46+
ParquetCompressionOptions,
4647
ReadBuffer,
4748
StorageOptions,
4849
WriteBuffer,
@@ -175,7 +176,7 @@ def write(
175176
self,
176177
df: DataFrame,
177178
path: FilePath | WriteBuffer[bytes],
178-
compression: str | None = "snappy",
179+
compression: ParquetCompressionOptions = "snappy",
179180
index: bool | None = None,
180181
storage_options: StorageOptions | None = None,
181182
partition_cols: list[str] | None = None,
@@ -411,7 +412,7 @@ def to_parquet(
411412
df: DataFrame,
412413
path: FilePath | WriteBuffer[bytes] | None = None,
413414
engine: str = "auto",
414-
compression: str | None = "snappy",
415+
compression: ParquetCompressionOptions = "snappy",
415416
index: bool | None = None,
416417
storage_options: StorageOptions | None = None,
417418
partition_cols: list[str] | None = None,

0 commit comments

Comments
 (0)