From ad4d5d8156e53d4604ad153c5c69fc32977eb976 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 22 Mar 2023 17:09:18 -0600 Subject: [PATCH 1/6] Fix: quick fix to issue with complex package ordered list' --- package-structure-code/complex-python-package-builds.md | 1 - 1 file changed, 1 deletion(-) diff --git a/package-structure-code/complex-python-package-builds.md b/package-structure-code/complex-python-package-builds.md index 58183c4e3..5ab8b2e06 100644 --- a/package-structure-code/complex-python-package-builds.md +++ b/package-structure-code/complex-python-package-builds.md @@ -12,7 +12,6 @@ categories can in turn help you select the correct package front-end and back-end tools. 1. **Pure-python packages:** these are packages that only rely on Python to function. Building a pure Python package is simpler. As such, you can chose a tool below that has the features that you want and be done with your decision! - 2. **Python packages with non-Python extensions:** These packages have additional components called extensions written in other languages (such as C or C++). If you have a package with non-python extensions, then you need to select a build back-end tool that allows you to add additional build steps needed to compile your extension code. Further, if you wish to use a front-end tool to support your workflow, you will need to select a tool that supports additional build setups. In this case, you could use setuptools. However, we suggest that you chose build tool that supports custom build steps such as Hatch with Hatchling or PDM. PDM is an excellent choice as it allows you to also select your build back-end of choice. We will discuss this at a high level on the complex builds page. 3. **Python packages that have extensions written in different languages (e.g. Fortran and C++) or that have non Python dependencies that are difficult to install (e.g. GDAL)** These packages often have complex build steps (more complex than a package with just a few C extensions for instance). As such, these packages require tools such as [scikit-build](https://scikit-build.readthedocs.io/en/latest/) From ce3acd5468626610a7cd8a37f1b5e3a9c7cddfdf Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 22 Mar 2023 17:18:09 -0600 Subject: [PATCH 2/6] Fix ordered list --- package-structure-code/complex-python-package-builds.md | 1 + 1 file changed, 1 insertion(+) diff --git a/package-structure-code/complex-python-package-builds.md b/package-structure-code/complex-python-package-builds.md index 5ab8b2e06..58183c4e3 100644 --- a/package-structure-code/complex-python-package-builds.md +++ b/package-structure-code/complex-python-package-builds.md @@ -12,6 +12,7 @@ categories can in turn help you select the correct package front-end and back-end tools. 1. **Pure-python packages:** these are packages that only rely on Python to function. Building a pure Python package is simpler. As such, you can chose a tool below that has the features that you want and be done with your decision! + 2. **Python packages with non-Python extensions:** These packages have additional components called extensions written in other languages (such as C or C++). If you have a package with non-python extensions, then you need to select a build back-end tool that allows you to add additional build steps needed to compile your extension code. Further, if you wish to use a front-end tool to support your workflow, you will need to select a tool that supports additional build setups. In this case, you could use setuptools. However, we suggest that you chose build tool that supports custom build steps such as Hatch with Hatchling or PDM. PDM is an excellent choice as it allows you to also select your build back-end of choice. We will discuss this at a high level on the complex builds page. 3. **Python packages that have extensions written in different languages (e.g. Fortran and C++) or that have non Python dependencies that are difficult to install (e.g. GDAL)** These packages often have complex build steps (more complex than a package with just a few C extensions for instance). As such, these packages require tools such as [scikit-build](https://scikit-build.readthedocs.io/en/latest/) From 2f7f41b427fdc95c2468d9f001340de7ab090988 Mon Sep 17 00:00:00 2001 From: Simon <31246246+SimonMolinsky@users.noreply.github.com> Date: Fri, 24 Mar 2023 09:33:20 +0100 Subject: [PATCH 3/6] The type hinting tutorial part 1 --- .gitignore | 4 + .../document-your-code-api-docstrings.md | 104 ++++++++++++++++++ 2 files changed, 108 insertions(+) diff --git a/.gitignore b/.gitignore index 46e3d0d82..54529bf67 100644 --- a/.gitignore +++ b/.gitignore @@ -5,4 +5,8 @@ tmp/ .DS_Store .nox __pycache__ +<<<<<<< Updated upstream *notes-from-review.md +======= +*.idea* +>>>>>>> Stashed changes diff --git a/documentation/write-user-documentation/document-your-code-api-docstrings.md b/documentation/write-user-documentation/document-your-code-api-docstrings.md index 6b964357a..f8b0b1478 100644 --- a/documentation/write-user-documentation/document-your-code-api-docstrings.md +++ b/documentation/write-user-documentation/document-your-code-api-docstrings.md @@ -207,3 +207,107 @@ def add_me(aNum, aNum2): ``` + + +## Beyond docstrings: type hints + +We can use docstrings to describe data types that we pass into functions as parameters or +into classes as attributes. We do it with package users in mind. + +What with us – developers? We can think of ourselves and the new contributors +and start using *type hinting* to make our journey safer! + +There are solid reasons why to use type hints: + +- Development and debugging is faster, +- We clearly see data flow and its transformations, +- We can use tools like `mypy` or integrated tools of Python IDEs for static type checking and code debugging. + +We should consider type hinting if our package performs data processing, +functions require complex inputs, and we want to lower the entrance barrier for our contributors. +The icing on the cake is that the code in our package will be aligned with the best industry standards. + +But there are reasons to *skip* type hinting: + +- Type hints may make code unreadable, especially when a parameter’s input takes multiple data types and we list them all, +- It doesn’t make sense to write type hints for simple scripts and functions that perform obvious operations. + +Fortunately for us, type hinting is not all black and white. +We can gradually describe the parameters and outputs of some functions but leave others as they are. +Type hinting can be an introductory task for new contributors in seasoned packages, +that way their learning curve about data flow and dependencies between API endpoints will be smoother. + +## Type hints in practice + +Type hinting was introduced with Python 3.5 and is described in [PEP 484](https://peps.python.org/pep-0484/). +**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with type hinting? +It is not. Type hints are optional and static and they will work like that in the future where Python is Python. +The power of type hints lies somewhere between docstrings and unit tests, and with it we can avoid many bugs +throughout development. + +We've seen type hints in the simple example earlier, and we will change it slightly: + + +```python +from typing import Dict, List + + +def extent_to_json(ext_obj: List) -> Dict: + """Convert bounds to a shapely geojson like spatial object.""" + pass +``` + +Here we focus on the new syntax. First, we have described the parameter `ext_obj` as the `List` class. How do we do it? By adding a colon after parameter and the name of a class that is passed into a function. It’s not over and we see that the function definition after closing parenthesis is expanded. If we want to inform type checker what type function returns, then we create the arrow sign `->` that points to a returned type and after it we have function’s colon. Our function returns Python dictonray (`Dict`). + +```{note} +We have exported classes `List` and `Dict` from `typing` module but we may use +`list` or `dict` keywords instead. We will achieve the same result. +Capitalized keywords are required when our package uses Python versions that are lower than +Python 3.9. Python 3.7 will be deprecated in June 2023, Python 3.8 in October 2024. +Thus, if your package supports the whole ecosystem, it should use `typing` module syntax. +``` + +### Type hints: basic example + +The best way to learn is by example. We will use the [pystiche](https://github.com/pystiche/pystiche/tree/main) package. +To avoid confusion, we start from a mathematical operation with basic data types: + +```python +import torch + + +def _norm(x: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor: + ... + +``` + +The function has three parameters: + +- `x` that is required and its type is `torch.Tensor`, +- `dim`, optional `int` with a default value equal to `1`, +- `eps`, optional `float` with a default value equal to `1e-8`. + +As we see, we can use basic data types to mark simple variables. The basic set of those types is: + +- `int`, +- `float`, +- `str`, +- `bool` +- `complex`. + +Most frequently we will use those types within a simple functions that are *close to data*. +However, sometimes our variable will be a data structure that isn't built-in within Python itself +but comes from other packages: + +- `Tensor` from `pytorch`, +- `ndarray` from `numpy`, +- `DataFrame` from `pandas`, +- `Session` from `requests`. + +To perform type checking we must import those classes, then we can set those as a parameter's type. +The same is true if we want to use classes from within our package (but we should avoid **circular imports**, +the topic that we will uncover later). + +### Type hints: complex data types + + From 7b03eed350fb5d3bd2a27c4bd66c5d0f0cc6c7a3 Mon Sep 17 00:00:00 2001 From: Simon <31246246+SimonMolinsky@users.noreply.github.com> Date: Sat, 25 Mar 2023 13:01:18 +0100 Subject: [PATCH 4/6] pre-language corrections commit --- .../document-your-code-api-docstrings.md | 150 ++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/documentation/write-user-documentation/document-your-code-api-docstrings.md b/documentation/write-user-documentation/document-your-code-api-docstrings.md index f8b0b1478..fb6bc7e05 100644 --- a/documentation/write-user-documentation/document-your-code-api-docstrings.md +++ b/documentation/write-user-documentation/document-your-code-api-docstrings.md @@ -310,4 +310,154 @@ the topic that we will uncover later). ### Type hints: complex data types +We can use type hints to describe other objects available in Python. +The little sample of those objects are: + +- `List` (= `list`) +- `Dict` (= `dict`) +- `Tuple` (= `tuple`) +- `Set` (= `set`) + +How do `pystiche` developers use those objects in their code? Let's take a look at the example below: + +```python +from typing import List, Optional +import torch + + +def _extract_prev(self, idx: int, idcs: List[int]) -> Optional[str]: + ... + +``` + +The function has two parameters. Parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without +square brackets and data type that is within a list. + +The `_extract_prev` function returns `Optional` type. It is a special type that is used to describe inputs and output +that can be `None`. There are more interesting types that we can use in our code: + +- `Union` – we can use it to describe a variable that can be of multiple types, the common example could be: + +```python +from typing import List, Union +import numpy as np +import pandas as pd + + +def process_data(data: Union[np.ndarray, pd.DataFrame, List]) -> np.ndarray: + ... + +``` + +What's the problem with the example above? With more data types that can be passed into parameter `data`, the function definition +becomes unreadable. We have two solutions for this issue. The first one is to use `Any` type that is a wildcard type: + +```python +from typing import Any + + +def process_data(data: Any) -> np.ndarray: + ... + +``` + +The second solution is to think what is a high level representation of passed data types. The examples are: + +- `Sequence` – we can use it to describe a variable that is a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`. +- `Iterable` – we can use it to describe a variable that is iterable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`. +- `Mapping` – we can use it to describe a variable that is a mapping. Mappings are `dict` and `defaultdict`. +- `Hashable` – we can use it to describe a variable that is hashable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`. +- `Collection` - we can use it to describe a variable that is a collection. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`. + +Thus, the function could look like: + +```python +from typing import Iterable + + +def process_data(data: Iterable) -> np.ndarray: + ... + +``` + +### Type hints: special typing objects + +The `typing` module provides us with more objects that we can use to describe our variables. +Interesting object is `Callable` that we can use to describe a variable that is a function. Usually, +when we write decorators or wrappers, we use `Callable` type. The example in the context of `pystiche` package: + +```python +from typing import Callable + + +def _deprecate(fn: Callable) -> Callable: + ... + + +``` + +The `Callable`can be used as a single word or as a word with square brackets that has two parameters: `Callable[[arg1, arg2], return_type]`. +The first parameter is a list of arguments, the second one is a return type. + +There is one more important case around type hints. Sometimes we want to describe a variable that comes from within +our package. Usually we can do it without any problems: + +```python +from my_package import my_data_class + + +def my_function(data: my_data_class) -> None: + ... + +``` + +and it will work fine. But we may encounter *circual imports* that are a problem. What is a *circular import*? +It is a case when we want to import module B into module A but module A is already imported into module B. +It seems like we are importing the same module twice into itself. The issue is rare when we program without type +hinting. However, with type hints it could be tedious. + +Thus, if you encounter this error: + +```python +from my_package import my_data_class + + +def my_function(data: my_data_class) -> None: + ... + +``` + +```shell +ImportError: cannot import name 'my_data_class' from partially initialized module 'my_package' (most likely due to a circular import) (/home/user/my_package/__init__.py) +``` + +Then you should use `typing.TYPE_CHECKING` clause to avoid circular imports. The example: + +```python +from __future__ import annotations +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from my_package import my_data_class + + +def my_function(data: my_data_class) -> None: + ... + +``` + +Unfortunately, the solution is dirty because we have to +use `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work! Type hinting +is not only roses and butterflies! + +### Type hinting: final remarks and tools + +There are few tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/). +It's a good idea to add it to your Continuous Integration (CI) pipeline. +Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`, most of them are based on `mypy` logic. + +At this point, we have a good understanding of type hints and how to use them in our code. There is one last thing to +remember. **Type hints are not required in all our functions and we can introduce those gradually, it won't damage our code**. +It is very convenient way of using this extraordinary feature! + From 1f2548da61cf585e84d4009ad9898c49fccd070b Mon Sep 17 00:00:00 2001 From: Simon <31246246+SimonMolinsky@users.noreply.github.com> Date: Sat, 25 Mar 2023 18:26:07 +0100 Subject: [PATCH 5/6] cleaned and final update of type hints --- .gitignore | 3 - .../document-your-code-api-docstrings.md | 157 +++++++++++------- 2 files changed, 101 insertions(+), 59 deletions(-) diff --git a/.gitignore b/.gitignore index 54529bf67..49fb9bc99 100644 --- a/.gitignore +++ b/.gitignore @@ -5,8 +5,5 @@ tmp/ .DS_Store .nox __pycache__ -<<<<<<< Updated upstream *notes-from-review.md -======= *.idea* ->>>>>>> Stashed changes diff --git a/documentation/write-user-documentation/document-your-code-api-docstrings.md b/documentation/write-user-documentation/document-your-code-api-docstrings.md index fb6bc7e05..3607f3e20 100644 --- a/documentation/write-user-documentation/document-your-code-api-docstrings.md +++ b/documentation/write-user-documentation/document-your-code-api-docstrings.md @@ -211,15 +211,15 @@ def add_me(aNum, aNum2): ## Beyond docstrings: type hints -We can use docstrings to describe data types that we pass into functions as parameters or -into classes as attributes. We do it with package users in mind. +We use docstrings to describe data types that we pass into functions as parameters or +into classes as attributes. *We do it with our users in mind.* -What with us – developers? We can think of ourselves and the new contributors +**What with us – developers?** We can think of ourselves and the new contributors, and start using *type hinting* to make our journey safer! There are solid reasons why to use type hints: -- Development and debugging is faster, +- Development and debugging are faster, - We clearly see data flow and its transformations, - We can use tools like `mypy` or integrated tools of Python IDEs for static type checking and code debugging. @@ -230,22 +230,22 @@ The icing on the cake is that the code in our package will be aligned with the b But there are reasons to *skip* type hinting: - Type hints may make code unreadable, especially when a parameter’s input takes multiple data types and we list them all, -- It doesn’t make sense to write type hints for simple scripts and functions that perform obvious operations. +- Writing type hints for simple scripts and functions that perform obvious operations don't make sense. Fortunately for us, type hinting is not all black and white. We can gradually describe the parameters and outputs of some functions but leave others as they are. -Type hinting can be an introductory task for new contributors in seasoned packages, -that way their learning curve about data flow and dependencies between API endpoints will be smoother. +Type hinting can be a task for new contributors to get them used to the package structure. +That way, their learning curve about data flow and dependencies between API endpoints will be smoother. ## Type hints in practice Type hinting was introduced with Python 3.5 and is described in [PEP 484](https://peps.python.org/pep-0484/). -**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with type hinting? -It is not. Type hints are optional and static and they will work like that in the future where Python is Python. -The power of type hints lies somewhere between docstrings and unit tests, and with it we can avoid many bugs +**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with this feature? +It is not. Type hints are optional and static. They will work like that in the future until Python is Python. +The power of type hints lies somewhere between docstrings and unit tests, and with it, we can avoid many bugs throughout development. -We've seen type hints in the simple example earlier, and we will change it slightly: +We've seen type hints in the simple example earlier. Let's come back to it and change it slightly: ```python @@ -254,23 +254,28 @@ from typing import Dict, List def extent_to_json(ext_obj: List) -> Dict: """Convert bounds to a shapely geojson like spatial object.""" - pass + ... + ``` -Here we focus on the new syntax. First, we have described the parameter `ext_obj` as the `List` class. How do we do it? By adding a colon after parameter and the name of a class that is passed into a function. It’s not over and we see that the function definition after closing parenthesis is expanded. If we want to inform type checker what type function returns, then we create the arrow sign `->` that points to a returned type and after it we have function’s colon. Our function returns Python dictonray (`Dict`). +Here we focus on the new syntax. First, we described the parameter `ext_obj` as the `List` class. How do we do it? +Add a colon after the parameter (variable) and the name of a class that is passed into a function. +It’s not over. Do you see, that the function definition after closing parenthesis is expanded? +If we want to inform the type checker what the function returns, then we create the arrow sign `->` that points to a returned type, +and after it, we put the function’s colon. Our function returns a Python dictionary (`Dict`). ```{note} -We have exported classes `List` and `Dict` from `typing` module but we may use +We have exported classes `List` and `Dict` from the `typing` module, but we may use `list` or `dict` keywords instead. We will achieve the same result. Capitalized keywords are required when our package uses Python versions that are lower than -Python 3.9. Python 3.7 will be deprecated in June 2023, Python 3.8 in October 2024. -Thus, if your package supports the whole ecosystem, it should use `typing` module syntax. +Python 3.9. Python 3.7 will be deprecated in June 2023, and Python 3.8 in October 2024. +Thus, if your package supports the whole ecosystem, it should use the `typing` module syntax. ``` ### Type hints: basic example The best way to learn is by example. We will use the [pystiche](https://github.com/pystiche/pystiche/tree/main) package. -To avoid confusion, we start from a mathematical operation with basic data types: +To avoid confusion, we start with a mathematical operation: ```python import torch @@ -283,7 +288,7 @@ def _norm(x: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor: The function has three parameters: -- `x` that is required and its type is `torch.Tensor`, +- `x` that is required, and its type is `torch.Tensor`, - `dim`, optional `int` with a default value equal to `1`, - `eps`, optional `float` with a default value equal to `1e-8`. @@ -295,7 +300,7 @@ As we see, we can use basic data types to mark simple variables. The basic set o - `bool` - `complex`. -Most frequently we will use those types within a simple functions that are *close to data*. +We will most frequently use those types within simple functions that are *close to data*. However, sometimes our variable will be a data structure that isn't built-in within Python itself but comes from other packages: @@ -304,14 +309,14 @@ but comes from other packages: - `DataFrame` from `pandas`, - `Session` from `requests`. -To perform type checking we must import those classes, then we can set those as a parameter's type. +To perform type checking, we must import those classes. Then we can set those as a parameter's type. The same is true if we want to use classes from within our package (but we should avoid **circular imports**, -the topic that we will uncover later). +the topic we will uncover later). ### Type hints: complex data types We can use type hints to describe other objects available in Python. -The little sample of those objects are: +A little sample of those objects are: - `List` (= `list`) - `Dict` (= `dict`) @@ -330,13 +335,13 @@ def _extract_prev(self, idx: int, idcs: List[int]) -> Optional[str]: ``` -The function has two parameters. Parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without +The function has two parameters. The parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without square brackets and data type that is within a list. -The `_extract_prev` function returns `Optional` type. It is a special type that is used to describe inputs and output +The `_extract_prev` function returns the `Optional` type. It is a special type that describes inputs and output that can be `None`. There are more interesting types that we can use in our code: -- `Union` – we can use it to describe a variable that can be of multiple types, the common example could be: +- `Union` – we can use it to describe a variable of multiple types. An example could be: ```python from typing import List, Union @@ -349,8 +354,8 @@ def process_data(data: Union[np.ndarray, pd.DataFrame, List]) -> np.ndarray: ``` -What's the problem with the example above? With more data types that can be passed into parameter `data`, the function definition -becomes unreadable. We have two solutions for this issue. The first one is to use `Any` type that is a wildcard type: +What's the problem with the example above? The function definition becomes unreadable with more data types passed into the parameter `data`. +We have two solutions for this issue. The first one is to use the `Any` type, which is a wildcard that is equal to not passing any type. ```python from typing import Any @@ -361,15 +366,15 @@ def process_data(data: Any) -> np.ndarray: ``` -The second solution is to think what is a high level representation of passed data types. The examples are: +The second solution is to think what is a high-level representation of a passed data type. The examples are: -- `Sequence` – we can use it to describe a variable that is a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`. -- `Iterable` – we can use it to describe a variable that is iterable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`. +- `Sequence` – we can use it to describe a variable as a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`. +- `Iterable` – we can use it to describe an iterable variable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`. - `Mapping` – we can use it to describe a variable that is a mapping. Mappings are `dict` and `defaultdict`. -- `Hashable` – we can use it to describe a variable that is hashable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`. -- `Collection` - we can use it to describe a variable that is a collection. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`. +- `Hashable` – we can use it to describe a hashable variable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`. +- `Collection` - we can use it to describe a collection variable. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`. -Thus, the function could look like: +Thus, the function could look like this: ```python from typing import Iterable @@ -380,11 +385,11 @@ def process_data(data: Iterable) -> np.ndarray: ``` -### Type hints: special typing objects +### Type hints: unique objects and interesting cases The `typing` module provides us with more objects that we can use to describe our variables. -Interesting object is `Callable` that we can use to describe a variable that is a function. Usually, -when we write decorators or wrappers, we use `Callable` type. The example in the context of `pystiche` package: +An interesting object is `Callable` that we can use to describe a variable that is a function. Usually, +when we write decorators or wrappers, we use the `Callable` type. The example in the context of the `pystiche` package: ```python from typing import Callable @@ -393,14 +398,13 @@ from typing import Callable def _deprecate(fn: Callable) -> Callable: ... - ``` -The `Callable`can be used as a single word or as a word with square brackets that has two parameters: `Callable[[arg1, arg2], return_type]`. -The first parameter is a list of arguments, the second one is a return type. +The `Callable`can be used as a single word or a word with square brackets with two parameters: `Callable[[arg1, arg2], return_type]`. +The first parameter is a list of arguments, and the second is a function output's data type. -There is one more important case around type hints. Sometimes we want to describe a variable that comes from within -our package. Usually we can do it without any problems: +There is an important case around type hints. Sometimes we want to describe a variable that comes from within +our package. Usually, we can do it without problems: ```python from my_package import my_data_class @@ -411,10 +415,10 @@ def my_function(data: my_data_class) -> None: ``` -and it will work fine. But we may encounter *circual imports* that are a problem. What is a *circular import*? -It is a case when we want to import module B into module A but module A is already imported into module B. -It seems like we are importing the same module twice into itself. The issue is rare when we program without type -hinting. However, with type hints it could be tedious. +And it will work fine. But we may encounter *circular imports* that need to be fixed. What is a *circular import*? +It is a case when we want to import module B into module A, but module A is already imported into module B. +We are importing the same module into itself. The issue is rare when we program without type +hinting. However, with type hints, it could be tedious. Thus, if you encounter this error: @@ -431,12 +435,13 @@ def my_function(data: my_data_class) -> None: ImportError: cannot import name 'my_data_class' from partially initialized module 'my_package' (most likely due to a circular import) (/home/user/my_package/__init__.py) ``` -Then you should use `typing.TYPE_CHECKING` clause to avoid circular imports. The example: +Then you should use the `typing.TYPE_CHECKING` clause to avoid circular imports. The example: ```python from __future__ import annotations from typing import TYPE_CHECKING + if TYPE_CHECKING: from my_package import my_data_class @@ -446,18 +451,58 @@ def my_function(data: my_data_class) -> None: ``` -Unfortunately, the solution is dirty because we have to -use `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work! Type hinting -is not only roses and butterflies! +Unfortunately, the solution is *dirty* because we have to +use the `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work. It make our +script messier! Type hinting is not only the roses and butterflies! + +The nice feature of type hinting is that we can define variable's type within a function: + +```python +from typing import Dict +import numpy as np + + +def validate_model_input(data: np.ndarray) -> Dict: + """ + Function checks if dataset has enough records to perform modeling. + + Parameters + ---------- + data : np.ndarray + Input data. + + Returns + ------- + : Dict + Dictionary with `data`, `info` and `status` to decide if pipeline can proceed with modeling. + """ + + output: Dict = None # type hinting + + # Probably we don't have the lines below yet + + # if data.shape[0] > 50: + # output = {"data": data, "info": "Dataset is big enough for statistical tests.", "status": True} + # else: + # output = {"data": data, "info": "Dataset is too small for statistical tests.", "status": False} + + return output + +``` + +We will use this feature rarely. The most probable scenario is when we start defining a function and its output, but +we don't know how we will process data. In this context, we can still run type checking to be sure that the +function behaves as we expect within the newly designed pipeline. -### Type hinting: final remarks and tools +(Another scenario: we will be forced to add type hints to silence dynamic type checkers from some IDEs ;) ). -There are few tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/). -It's a good idea to add it to your Continuous Integration (CI) pipeline. -Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`, most of them are based on `mypy` logic. -At this point, we have a good understanding of type hints and how to use them in our code. There is one last thing to -remember. **Type hints are not required in all our functions and we can introduce those gradually, it won't damage our code**. -It is very convenient way of using this extraordinary feature! +### Type hinting: final remarks +There are tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/). +Adding it to your Continuous Integration (CI) pipeline is a good idea. +Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`; most are based on `mypy` logic. +The last thing to remember is that **type hints are optional in all our functions, and we can introduce them gradually, +which won't damage our code and output generated by CI type checking tools**. +It is a very convenient way of using this extraordinary feature! From b17a76fe4b3c3741748c475ccdb8b38294e73ac0 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 4 Apr 2023 13:00:03 -0600 Subject: [PATCH 6/6] Fix: small edits to type hint section Update documentation/write-user-documentation/document-your-code-api-docstrings.md Update documentation/write-user-documentation/document-your-code-api-docstrings.md Update documentation/write-user-documentation/document-your-code-api-docstrings.md Update documentation/write-user-documentation/document-your-code-api-docstrings.md Update documentation/write-user-documentation/document-your-code-api-docstrings.md Edits to type hint section --- .../document-your-code-api-docstrings.md | 347 ++++-------------- 1 file changed, 62 insertions(+), 285 deletions(-) diff --git a/documentation/write-user-documentation/document-your-code-api-docstrings.md b/documentation/write-user-documentation/document-your-code-api-docstrings.md index 3607f3e20..10bb75f86 100644 --- a/documentation/write-user-documentation/document-your-code-api-docstrings.md +++ b/documentation/write-user-documentation/document-your-code-api-docstrings.md @@ -1,6 +1,7 @@ # Document the code in your package's API using docstrings ## What is an API? + API stands for **A**pplied **P**rogramming **I**nterface. When discussed in the context of a (Python) package, the API refers to the functions, methods and classes that a package maintainer creates for users. @@ -14,13 +15,14 @@ using the package's API. Package APIs consist of functions and/or classes, methods and attributes that create a user interface (known as the API). ## What is a docstring and how does it relate to documentation? + In Python a docstring refers to text in a function, method or class that describes what the function does and its inputs and outputs. Python programmers usually refer to the inputs to functions as ["parameters"](https://docs.python.org/3/glossary.html#term-parameter) or ["arguments"](https://docs.python.org/3/faq/programming.html#faq-argument-vs-parameter), and the outputs are often called "return values" The docstring is thus important for: -* When you call `help()` in Python, for example, `help(add_numbers)`, the text of the function's docstring is printed. The docstring thus helps a user better understand how to applying the function more effectively to their workflow. -* When you build your package's documentation, the docstrings can be also used to automagically create full API documentation that provides a clean view of all its functions, methods, attributes, and classes. +- When you call `help()` in Python, for example, `help(add_numbers)`, the text of the function's docstring is printed. The docstring thus helps a user better understand how to applying the function more effectively to their workflow. +- When you build your package's documentation, the docstrings can be also used to automagically create full API documentation that provides a clean view of all its functions, methods, attributes, and classes. ```{tip} Example API Documentation for all functions, methods, attributes and classes in a package. @@ -31,31 +33,32 @@ Example API Documentation for all functions, methods, attributes and classes in ## Python package API documentation If you have a descriptive docstring for every user-facing -class, method, attribute and/or function in your package (*within reason*), then your package's API is considered well-documented. +class, method, attribute and/or function in your package (_within reason_), then your package's API is considered well-documented. In Python, this means that you need to add a docstring for every user-facing -class, method, attribute and/or function in your package (*within reason*) that: +class, method, attribute and/or function in your package (_within reason_) that: -* Explains what the function, method, attribute or class does -* Defines the `type` inputs and outputs (ie. `string`, `int`, `np.array`) -* Explains the expected output `return` of the object, method or function. +- Explains what the function, method, attribute or class does +- Defines the `type` inputs and outputs (ie. `string`, `int`, `np.array`) +- Explains the expected output `return` of the object, method or function. ### Three Python docstring formats and why we like NumPy style There are several Python docstring formats that you can chose to use when documenting your package including: -* [NumPy-style](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) -* [google style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) -* [reST style](https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html) +- [NumPy-style](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) +- [google style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) +- [reST style](https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html) + We suggest using [NumPy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) for your Python documentation because: -* NumPy style docstrings are core to the scientific Python ecosystem and defined in the [NumPy style guide](https://numpydoc.readthedocs.io/en/latest/format.html). Thus you will find them widely used there. -* The Numpy style docstring is simplified and thus easier-to-read both in the code and when calling `help()` in Python. In contrast, some feel that reST style docstrings is harder to quickly scan, and can take up more lines of code in modules. +- NumPy style docstrings are core to the scientific Python ecosystem and defined in the [NumPy style guide](https://numpydoc.readthedocs.io/en/latest/format.html). Thus you will find them widely used there. +- The Numpy style docstring is simplified and thus easier-to-read both in the code and when calling `help()` in Python. In contrast, some feel that reST style docstrings is harder to quickly scan, and can take up more lines of code in modules. ```{tip} If you are using NumPy style docstrings, be sure to include the [sphinx napoleon @@ -96,7 +99,9 @@ def extent_to_json(ext_obj): ``` + (docstring_best_practice)= + ### Best: a docstring with example use of the function This example contains an example of using the function that is also tested in @@ -156,10 +161,11 @@ output of the `es.extent_to_json(rmnp)` command can even be tested using doctest adding another quality check to your package. ``` - ## Using doctest to run docstring examples in your package's methods and functions + + Above, we provided some examples of good, better, best docstring formats. If you are using Sphinx to create your docs, you can add the [doctest](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html) extension to your Sphinx build. Doctest provides an additional; check for docstrings with example code in them. Doctest runs the example code in your docstring `Examples` checking that the expected output is correct. Similar to running @@ -179,7 +185,6 @@ Below is an example of a docstring with an example. doctest will run the example below and test that if you provide `add_me` with the values 1 and 3 it will return 4. - ```python def add_me(aNum, aNum2): """A function that prints a number that it is provided. @@ -208,301 +213,73 @@ def add_me(aNum, aNum2): ``` +## Adding type hints to your docstrings -## Beyond docstrings: type hints - -We use docstrings to describe data types that we pass into functions as parameters or -into classes as attributes. *We do it with our users in mind.* - -**What with us – developers?** We can think of ourselves and the new contributors, -and start using *type hinting* to make our journey safer! - -There are solid reasons why to use type hints: - -- Development and debugging are faster, -- We clearly see data flow and its transformations, -- We can use tools like `mypy` or integrated tools of Python IDEs for static type checking and code debugging. - -We should consider type hinting if our package performs data processing, -functions require complex inputs, and we want to lower the entrance barrier for our contributors. -The icing on the cake is that the code in our package will be aligned with the best industry standards. - -But there are reasons to *skip* type hinting: - -- Type hints may make code unreadable, especially when a parameter’s input takes multiple data types and we list them all, -- Writing type hints for simple scripts and functions that perform obvious operations don't make sense. - -Fortunately for us, type hinting is not all black and white. -We can gradually describe the parameters and outputs of some functions but leave others as they are. -Type hinting can be a task for new contributors to get them used to the package structure. -That way, their learning curve about data flow and dependencies between API endpoints will be smoother. - -## Type hints in practice - -Type hinting was introduced with Python 3.5 and is described in [PEP 484](https://peps.python.org/pep-0484/). -**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with this feature? -It is not. Type hints are optional and static. They will work like that in the future until Python is Python. -The power of type hints lies somewhere between docstrings and unit tests, and with it, we can avoid many bugs -throughout development. - -We've seen type hints in the simple example earlier. Let's come back to it and change it slightly: - - -```python -from typing import Dict, List - - -def extent_to_json(ext_obj: List) -> Dict: - """Convert bounds to a shapely geojson like spatial object.""" - ... - -``` - -Here we focus on the new syntax. First, we described the parameter `ext_obj` as the `List` class. How do we do it? -Add a colon after the parameter (variable) and the name of a class that is passed into a function. -It’s not over. Do you see, that the function definition after closing parenthesis is expanded? -If we want to inform the type checker what the function returns, then we create the arrow sign `->` that points to a returned type, -and after it, we put the function’s colon. Our function returns a Python dictionary (`Dict`). - -```{note} -We have exported classes `List` and `Dict` from the `typing` module, but we may use -`list` or `dict` keywords instead. We will achieve the same result. -Capitalized keywords are required when our package uses Python versions that are lower than -Python 3.9. Python 3.7 will be deprecated in June 2023, and Python 3.8 in October 2024. -Thus, if your package supports the whole ecosystem, it should use the `typing` module syntax. -``` - -### Type hints: basic example - -The best way to learn is by example. We will use the [pystiche](https://github.com/pystiche/pystiche/tree/main) package. -To avoid confusion, we start with a mathematical operation: - -```python -import torch - - -def _norm(x: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor: - ... - -``` - -The function has three parameters: - -- `x` that is required, and its type is `torch.Tensor`, -- `dim`, optional `int` with a default value equal to `1`, -- `eps`, optional `float` with a default value equal to `1e-8`. - -As we see, we can use basic data types to mark simple variables. The basic set of those types is: - -- `int`, -- `float`, -- `str`, -- `bool` -- `complex`. - -We will most frequently use those types within simple functions that are *close to data*. -However, sometimes our variable will be a data structure that isn't built-in within Python itself -but comes from other packages: - -- `Tensor` from `pytorch`, -- `ndarray` from `numpy`, -- `DataFrame` from `pandas`, -- `Session` from `requests`. - -To perform type checking, we must import those classes. Then we can set those as a parameter's type. -The same is true if we want to use classes from within our package (but we should avoid **circular imports**, -the topic we will uncover later). - -### Type hints: complex data types - -We can use type hints to describe other objects available in Python. -A little sample of those objects are: - -- `List` (= `list`) -- `Dict` (= `dict`) -- `Tuple` (= `tuple`) -- `Set` (= `set`) - -How do `pystiche` developers use those objects in their code? Let's take a look at the example below: - -```python -from typing import List, Optional -import torch - - -def _extract_prev(self, idx: int, idcs: List[int]) -> Optional[str]: - ... - -``` - -The function has two parameters. The parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without -square brackets and data type that is within a list. - -The `_extract_prev` function returns the `Optional` type. It is a special type that describes inputs and output -that can be `None`. There are more interesting types that we can use in our code: - -- `Union` – we can use it to describe a variable of multiple types. An example could be: +In the example above, you saw the use of numpy-style docstrings to describe data types +that are passed into functions as parameters or +into classes as attributes. In a numpy-style docstring you add those +types in the Parameters section of the docstring. Below you can see that +the parameter `aNum` should be a Python `int` (integer) value. ```python -from typing import List, Union -import numpy as np -import pandas as pd - - -def process_data(data: Union[np.ndarray, pd.DataFrame, List]) -> np.ndarray: - ... - -``` - -What's the problem with the example above? The function definition becomes unreadable with more data types passed into the parameter `data`. -We have two solutions for this issue. The first one is to use the `Any` type, which is a wildcard that is equal to not passing any type. - -```python -from typing import Any - - -def process_data(data: Any) -> np.ndarray: - ... - -``` - -The second solution is to think what is a high-level representation of a passed data type. The examples are: - -- `Sequence` – we can use it to describe a variable as a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`. -- `Iterable` – we can use it to describe an iterable variable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`. -- `Mapping` – we can use it to describe a variable that is a mapping. Mappings are `dict` and `defaultdict`. -- `Hashable` – we can use it to describe a hashable variable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`. -- `Collection` - we can use it to describe a collection variable. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`. - -Thus, the function could look like this: - -```python -from typing import Iterable - - -def process_data(data: Iterable) -> np.ndarray: - ... - + Parameters + ---------- + aNum : int + An integer value to be printed ``` -### Type hints: unique objects and interesting cases - -The `typing` module provides us with more objects that we can use to describe our variables. -An interesting object is `Callable` that we can use to describe a variable that is a function. Usually, -when we write decorators or wrappers, we use the `Callable` type. The example in the context of the `pystiche` package: - -```python -from typing import Callable - - -def _deprecate(fn: Callable) -> Callable: - ... - -``` +Describing the expected data type that a function or method requires +helps users better understand how to call a function or method. -The `Callable`can be used as a single word or a word with square brackets with two parameters: `Callable[[arg1, arg2], return_type]`. -The first parameter is a list of arguments, and the second is a function output's data type. +Type-hints add another layer of type documentation to your code. Type-hints +make make it easier for new developers, your future self or contributors +to get to know your code base quickly. -There is an important case around type hints. Sometimes we want to describe a variable that comes from within -our package. Usually, we can do it without problems: +Type hints are added to the definition of your function. In the example below, the parameters aNum and aNum2 are defined as being type = int (integer). ```python -from my_package import my_data_class - - -def my_function(data: my_data_class) -> None: - ... - +def add_me(aNum: int, aNum2: int): + """A function that prints a number that it is provided. ``` -And it will work fine. But we may encounter *circular imports* that need to be fixed. What is a *circular import*? -It is a case when we want to import module B into module A, but module A is already imported into module B. -We are importing the same module into itself. The issue is rare when we program without type -hinting. However, with type hints, it could be tedious. - -Thus, if you encounter this error: +You can further describe the expected function output using `->`. Below +the output of the function is also an int. ```python -from my_package import my_data_class - - -def my_function(data: my_data_class) -> None: - ... - +def add_me(aNum: int, aNum2: int) -> int: + """A function that prints a number that it is provided. ``` -```shell -ImportError: cannot import name 'my_data_class' from partially initialized module 'my_package' (most likely due to a circular import) (/home/user/my_package/__init__.py) -``` +### Why use type hints -Then you should use the `typing.TYPE_CHECKING` clause to avoid circular imports. The example: +Type hints: -```python -from __future__ import annotations -from typing import TYPE_CHECKING +- Make development and debugging faster, +- Make it easier for a user to see the data format inputs and outputs of methods and functions, +- Support using static type checking tools such as [`mypy`](https://mypy-lang.org/) which will check your code to ensure types are correct. +You should consider adding type hinting to your code if: -if TYPE_CHECKING: - from my_package import my_data_class +- Your package performs data processing, +- You use functions that require complex inputs +- You want to lower the entrance barrier for new contributors to help you with your code. +```{admonition} Beware of too much type hinting +:class: caution -def my_function(data: my_data_class) -> None: - ... +As you add type hints to your code consider that in some cases: +- If you have a complex code base, type hints may make code more difficult to read. This is especially true when a parameter’s input takes multiple data types and you list each one. +- Writing type hints for simple scripts and functions that perform obvious operations don't make sense. ``` -Unfortunately, the solution is *dirty* because we have to -use the `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work. It make our -script messier! Type hinting is not only the roses and butterflies! - -The nice feature of type hinting is that we can define variable's type within a function: - -```python -from typing import Dict -import numpy as np +### Gradually adding type hints +Adding type hints can take a lot of time. However, you can add +type hints incrementally as you work on your code. -def validate_model_input(data: np.ndarray) -> Dict: - """ - Function checks if dataset has enough records to perform modeling. - - Parameters - ---------- - data : np.ndarray - Input data. - - Returns - ------- - : Dict - Dictionary with `data`, `info` and `status` to decide if pipeline can proceed with modeling. - """ - - output: Dict = None # type hinting - - # Probably we don't have the lines below yet - - # if data.shape[0] > 50: - # output = {"data": data, "info": "Dataset is big enough for statistical tests.", "status": True} - # else: - # output = {"data": data, "info": "Dataset is too small for statistical tests.", "status": False} - - return output - +```{tip} +Adding type hints is also a great task for new contributors. It will +help them get to know your package's code and structure better before +digging into more complex contributions. ``` - -We will use this feature rarely. The most probable scenario is when we start defining a function and its output, but -we don't know how we will process data. In this context, we can still run type checking to be sure that the -function behaves as we expect within the newly designed pipeline. - -(Another scenario: we will be forced to add type hints to silence dynamic type checkers from some IDEs ;) ). - - -### Type hinting: final remarks - -There are tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/). -Adding it to your Continuous Integration (CI) pipeline is a good idea. -Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`; most are based on `mypy` logic. - -The last thing to remember is that **type hints are optional in all our functions, and we can introduce them gradually, -which won't damage our code and output generated by CI type checking tools**. -It is a very convenient way of using this extraordinary feature!