diff --git a/conf.py b/conf.py index 921251bb0..397e57aaf 100644 --- a/conf.py +++ b/conf.py @@ -128,7 +128,7 @@ html_theme = 'pydata_sphinx_theme' html_static_path = ["_static"] html_css_files = ["pyos.css"] -html_title = "pyOpenSci Package Guide" +html_title = "pyOpenSci Python Packaging Guide" html_logo = "images/logo/logo.png" # Add any paths that contain custom static files (such as style sheets) here, diff --git a/images/python-package-tools-2022-survey-pypa.png b/images/python-package-tools-2022-survey-pypa.png new file mode 100644 index 000000000..c3d7d4f8a Binary files /dev/null and b/images/python-package-tools-2022-survey-pypa.png differ diff --git a/images/python-package-tools-decision-tree.png b/images/python-package-tools-decision-tree.png new file mode 100644 index 000000000..67309790c Binary files /dev/null and b/images/python-package-tools-decision-tree.png differ diff --git a/index.md b/index.md index 654054b44..87a3d4dff 100644 --- a/index.md +++ b/index.md @@ -1,18 +1,20 @@ # pyOpenSci Python Open Source Package Development Guide + ```{toctree} :hidden: :caption: Documentation -Documentation -``` +Documentation Overview +``` ```{toctree} :hidden: :caption: Packaging -Packaging +Packaging + ``` ```{toctree} diff --git a/package-structure-code/complex-python-package-builds.md b/package-structure-code/complex-python-package-builds.md new file mode 100644 index 000000000..f919e2e45 --- /dev/null +++ b/package-structure-code/complex-python-package-builds.md @@ -0,0 +1,152 @@ +# Complex Python package builds + +This guide is focused on packages that are either pure-python or that +have a few simple extensions in another language such as C or C++. + +If your package is more complex, [you may want to refer to this guide +created by Ralf Gommers on Python packaging.](https://pypackaging-native.github.io/) + +## Pure Python Packages vs. packages with extensions in other languages + +You can classify Python package complexity into three general categories. These +categories can in turn help you select the correct package front-end and +back end tools. + +1. **Pure-python packages:** these are packages that only rely on Python to function. Building a pure Python package is simpler. As such, you can chose a tool below that +has the features that you want and be done with your decision! +2. **Python packages with non-Python extensions:** These packages have additional components called extensions written in other languages (such as `C` or `C++`). If you have a package with non-python extensions, then you need to select a build back-end tool that allows you to add additional build steps needed to compile your extension code. Further, if you wish to use a front-end tool to support your workflow, you will need to select a tool that +supports additional build setps. In this case, you could use setuptools. However, we suggest that you chose build tool that supports custom build steps such as Hatch with Hatchling or PDM. PDM is an excellent choice as it allows you to also select your build back end of choice. We will discuss this at a high level on the complex builds page. +3.**Python packages that have extensions written in different languages (e.g. fortran and C++) or that have non Python dependencies that are difficult to install (e.g. GDAL)** These packages often have complex build steps (more complex than a package with just a few C extensions for instance). As such, these packages require tools such as [scikit-build](https://scikit-build.readthedocs.io/en/latest/) +or [meson-python](https://mesonbuild.com/Python-module.html) to build. NOTE: you can use meson-python with PDM. + + + + + + + + + + + + + diff --git a/package-structure-code/intro.md b/package-structure-code/intro.md new file mode 100644 index 000000000..428bcd0dd --- /dev/null +++ b/package-structure-code/intro.md @@ -0,0 +1,88 @@ +# Python package structure information + +This section provides guidance on your Python package's structure, code formats and style. It also reviews the various packaging tools that you can use to +support building and publishing your package. + +```{note} +If you are considering submitting a package for peer review, have a look at the +bare-minimum [editor checks](https://www.pyopensci.org/peer-review-guide/software-peer-review-guide/editor-in-chief-guide.html#editor-checklist-template) that pyOpenSci +performs before a review begins. These checks are useful to explore +for both authors planning to submit a package to us for review and for +anyone who is just getting started with creating a Python package. + +In general these are basic items that should be in any open software repository. +``` + +## What you will learn here + +In this section of our Python packaging guide, we: + +* Provide an overview of the options available to you when packaging your tool +* Suggest tools and approaches that both meet your needs and also support existing standards. +* Suggest tools and approaches that will allow you to expand upon a workflow that may begin as a pure Python tool and evolve into a tool that requires addition layers of complexity in the packaging build. +* Align our suggestions with the most current, accepted +[PEPs (Python Enhancement Protocols)](https://peps.python.org/pep-0000/) and the [scientific-python community SPECs](https://scientific-python.org/specs/). +* In an effort to maintain consistency within our community , we also align with existing best practices being implemented by developers of core Scientific Python packages such as Numpy, SciPy and others. + +## Guidelines for pyOpenSci's packaging recommendations + + + +The flexibility of the Python programming language lends itself to a diverse +range of tool options for creating a Python package. Python is so flexible that +it is one of the few languages that can be used to wrap around other languages. + +If you are building a pure Python package, then your packaging setup can be +simple. However, some scientific packages have complex requirements as they may +need to support extensions or tools written in other languages such as C or C++. + +To support the many different uses of Python, there are many ways to create a +Python package. In this guide, we suggest approaches for packaging approaches and tools based +upon: + +1. What we think will be best and easiest to adopt for those who are newer to packaging +2. Tools that we think are well maintained and documented. +3. A shared goal of standardizing packaging approaches across this (scientific) Python ecosystem. + +Here, we also try to align our suggestions with the most current, accepted +[Python community](https://packaging.python.org/en/latest/) and [scientific community](https://scientific-python.org/specs/). + + +```{admonition} Suggestions in this guide are not pyOpenSci review requirements +:class: important + +The suggestions for package layout in this section are made with the +intent of being helpful; they are not specific requirements for your +package to be reviewed and accepted into our pyOpenSci open source ecosystem. + +Please check out our [package scope page](https://www.pyopensci.org/software-peer-review/about/package-scope.html) and [review requirements in our author guide](https://www.pyopensci.org/software-peer-review/how-to/author-guide.html#) if you are looking for Python package review requirements! +``` + + + + +```{toctree} +:hidden: +:caption: Package structure & code style + +Intro + +Python package structure +pyproject.toml Package Metadata +What are SDist & Wheel Files? +Package Build Tools +Complex Builds +``` diff --git a/package-structure-code/pyproject-toml-python-package-metadata.md b/package-structure-code/pyproject-toml-python-package-metadata.md new file mode 100644 index 000000000..c9905fb7b --- /dev/null +++ b/package-structure-code/pyproject-toml-python-package-metadata.md @@ -0,0 +1,104 @@ +# Use a pyproject.toml file for your package configuration & metadata + +The standard file that Python packages use to [specify build requirements and +metadata is called a **pyproject.toml**](https://packaging.python.org/en/latest/specifications/declaring-project-metadata/). Adding metadata, build requirements +and package dependencies to a **pyproject.toml** file replaces storing that +information in a setup.py or setup.cfg file. + +The **pyproject.toml** file is written in [TOML (Tom's Obvious, Minimal Language) format](https://toml.io/en/). TOML is an easy-to-read structure that is founded on key: value pairs. Each section in the **pyproject.toml** file contains a `[table identifier]`. +Below that table identifier are key value pairs that +support configuration for that particular table. + +### Benefits of using a pyproject.toml file + +Including your package's metadata in a separate human-readable **pyproject.toml** +format also allows someone to view the project's metadata in a GitHub repository. + + + +```{admonition} Setup.py is still useful for complex package builds +:class: tip + +Using **setup.py** to manage package builds and metadata [can cause problems with package development](https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html). +In some cases where a Python package build is complex, a **setup.py** file may +be required. While this guide will not cover complex builds, we will provide +resources working with complex builds in the future. + +``` + +### Example pyproject.toml for building using PDM +Below is an example build configuration for a Python project. This example +package setup uses: + +* **pdm.pep517.api** to build the [package's SDist and wheels](python-package-distribution-files-sdist-wheel) + +``` +[build-system] +requires = ["pdm-pep517>=1.0.0"] +build-backend = "pdm.pep517.api" + +[project] +name = "examplePy" +authors = [ + {name = "Some Maintainer", email = "some-email@pyopensci.org"} +] +maintainers = [{name = "All the contributors"}] +license = {text = "BSD 3-Clause"} +description = "An example Python package used to support Python packaging tutorials" +keywords = ["pyOpenSci", "python packaging"] +readme = "README.md" + +dependencies = [ + "dependency-package-name-1", + "dependency-package-name-2", +] +``` +Notice that dependencies are specified in this file. + +### Example pyproject.toml for building using setuptools + +The package metadata including authors, keywords, etc is also easy to read. +Below you can see the same toml file that uses a different build system (setuptools). +Notice how simple it is to swap out the tools needed to build this package! + +In this example package setup you use: + +* **setuptools** to build the [package's SDist and wheels](python-package-distribution-files-sdist-wheel) +* **setuptools_scm** to manage package version updates using version control tags + +In the example below `[build-system]` is the first table +of values. It has two keys that specify the build front end and backend for a package: + +1. `requires =` +1. `build-backend =` + +``` +[build-system] +requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"] +build-backend = "setuptools.build_meta" + +[project] +name = "examplePy" +authors = [ + {name = "Some Maintainer", email = "some-email@pyopensci.org"} +] +maintainers = [{name = "All the contributors"}] +license = {text = "BSD 3-Clause"} +description = "An example Python package used to support Python packaging tutorials" +keywords = ["pyOpenSci", "python packaging"] +readme = "README.md" + +dependencies = [ + "dependency-package-name-1", + "dependency-package-name-2", +] +``` + + + +```{note} +[Click here to read about our packaging build tools including PDM, setuptools, Poetry and Hatch.](/package-structure-code/python-package-build-tools) +``` diff --git a/package-structure-code/python-package-build-tools.md b/package-structure-code/python-package-build-tools.md new file mode 100644 index 000000000..8356a4d6c --- /dev/null +++ b/package-structure-code/python-package-build-tools.md @@ -0,0 +1,588 @@ +# Python Package Build Tools + + + +There are a several different build tools that you can use to [create your Python package's *SDist* and *Wheel* distributions](python-package-distribution-files-sdist-wheel). Below, we discuss the features, +benefits and limitations of the most commonly used Python packaging tools. +We focus on pure-python packages in this guide. However, we also +highlight tools that currently support packages with C/C++ and other language +extensions. + + + + + +:::{figure-md} fig-target + +Figure showing... will finish this once we are all happy with the figure and it's not going to change more... + +Diagram showing the various from end build tools that you can select from. Each tool has different features as highlighted below. +NOTE: this is still a DRAFT so i'm not going to spend time truly cleaning it up until i get lots of feedback on the general approach!! +::: + +If you want to know more about Python packages that have extensions written in +other languages, [check out the page on complex package builds.](complex-python-package-builds) + +## Build front-end vs. build back-end tools + +To better understand your options, when it comes to building a Python package, it's important to first understand the difference between a +build tool front-end and build back-end. + +### Build back-ends +Most packaging tools have a back-end +build tool that builds you package and creates associated +[(SDist and wheel) distribution files](python-package-distribution-files-sdist-wheel). Some tools, such as **Flit**, only +support pure-Python package builds. A pure-Python build refers +to a package build that does not have extensions that are written in another +programming language (such as `C` or `C++`). + + +Other packages that have C and C++ extensions (or that wrap other languages such as fortran) require additional code compilation steps when built. +Back-ends such as and **setuptools.build**, **meson.build** +and **scikit-build** support complex builds with custom steps. If your +build is particularly complex (i.e. you have more than a few `C`/`C++` +extensions), then we suggest you use **meson.build** or **scikit-build**. + + +### Python package build front-ends + +A packaging front-end tool refers to a tool that makes it easier for you to +perform common packaging tasks using similar commands. These tasks include: + +* [Build your packages (create the SDist and Wheel distributions](python-package-distribution-files-sdist-wheel) +* Installing your package in a development mode (so it updates when you update your code) +* Publishing to PyPI +* Running tests +* Building documentation +* Managing an environment or multiple environments in which you need to run tests and develop your package + +There are several Python packaging tools that you can use for pure Python +builds. Each front-end tool discussed below supports a slightly different set of Python +packaging tasks. + +For instance, you can use the packaging tools **Flit**, **Hatch** or **PDM** +to both build and publish your package to PyPI. However while **Hatch** and +**PDM** support versioning and environment management, **Flit** does not. If you want a tool that supports dependency +locking, you can use **PDM** or **Poetry** but not **Hatch**. + +```{note} +If you are using **Setuptools**, there is no user-friendly build front-end that performs multiple tasks. You will need to use **build** to build your package and **twine** to publish to PyPI. +``` + +### Example build steps that can be simplified using a front-end tool + +Below, you can see how a build tool streamlines your packaging experience. Example to build your package with **Hatch**: + +```bash +# Build your sDist and .whl files +hatch build + +# Example to publish to PyPI: +hatch publish --repository testpypi +``` + +Example build steps using the **setuptools** backend and **build**: + +```bash +# Build the package +python3 -m build + +# Publish to test PyPI using twine +twine upload -r testpypi dist/* +``` + +## Choosing a build back-end + +Most front-end packaging tools have their own back-end build tool. The build +tool creates your package's (SDist and Wheel) distribution files. For pure +Python packages, the main difference between the different build back-ends +discussed below is: + +* How configurable they are - for example, do they allow you to add build steps that support non python extensions? +* How much you need to configure them to ensure the correct files are included in your SDist and Wheel distributions. + + +### Build backend support for non pure-python packages + +It is important to note that some build back-ends, such as **Flit-core**, only support +pure Python builds. Other back ends support C and C++ extensions as follows: + +* setuptools supports builds using C / C++ extensions +* Hatchling (hatch's backend) supports C / C++ extensions via plugins that the developer creates to customize a build +* PDM's backend supports C / C++ extensions by using setuptools +* Poetry's backend supports C/C++ extensions however this functionality is currently undocumented. As such we don't recommend using Poetry for complex or non pure Python builds until it is documented. + +While we won't discuss more complex builds below, we will identify which tools +have documented support for C / C++ extensions. + + +## An ecosystem of Python build tools + +Below we introduce several of the most commonly used Python packaging build +front-end tools. We highlight the features that each tool offers as a way to +help you decide what tool might be best for your workflow. + +```{admonition} We do not suggest using setuptools +:class: note + +We suggest that you pick one of the modern tools listed above rather than +setuptools because setuptools will require some additional knowledge +to set up correctly. + +We review setuptools as a backend because it is still popular. However it is +not the most user friendly option. +``` + +The most commonly used tools in the ecosystem are +setuptools backend (with build) and Poetry (a front end tool with numerous +features and excellent documentation). + +:::{figure-md} pypa-survey-plot + +Graph showing the results of the 2022 PyPA survey of Python packaging tools. On the x axis is percent response and on the y axis are the tools. + +The Python developers survey results (n=>8,000 PyPI users) show setuptools and poetry as the most commonly used Python packaging tools. The core tools that we've seen being used in the scientific community are included here. [You can view the full survey results by clicking here.](https://drive.google.com/file/d/1U5d5SiXLVkzDpS0i1dJIA4Hu5Qg704T9/view) NOTE: this data represent maintainers across domains and is likely heavily represented by those in web development. So this represents a snapshot across the broader Python ecosystem. +::: + + +## Chose a build workflow tool + +The tools that we review below include: + +* Twine, Build + setuptools +* Flit +* Hatch +* PDM +* Poetry + +When you are selecting a tool, you might consider this general workflow of +questions: + +1. **Is your tool pure python? Yes?** You can use any tool that you wish! Pick the tool that has the features that you want to use in your build workflow. We suggest: +* flit, hatch, PDM or poetry (read below for more) +1. **Does your tool have a few C or C++ extensions?** Great, we suggest using +**PDM** for the time being. It is the only tool in the list below that has documented +workflow to support such extensions. It also supports other backends such as scikit build and meson-python that will allow you to fully customize your build. + +NOTE: You can also use Hatch for non pure python builds but you will need to +write your own plugin for this support. + + + + + +## Python packaging tools summary + + + +Below, we summarize features offered by the most popular build front end tools. +Note that because setuptools does not offer a front-end interface, it is not +included in the table. + + +### Package tool features table + +```{csv-table} +:header: Feature, Flit, Hatch, PDM, Poetry +:widths: 36, 10,10,10,10 + +Default Build Back-end, Flit-core, Hatch-core, PDM, Poetry-core +Use Other Build Backends,✖ , ✖,✅ ,✖ +Dependency management, ✖,✖,✅,✅ +Publish to PyPI, ✅,✅,✅,✅ +Version Control based versioning (using `git tags`),✖,✅,✅,✅ +Version bumping,✖,✅, ✅, ✅ +More than one maintainer? (bus factor),✖,✖, ✖, ✅ +``` + +Notes: + +* *Hatch plans to support using other backends and dependency management in the future* +* Poetry supports semantic versioning. Thus, it will support version bumping following commit messages if you use a tool such as Python Semantic Release + +## PDM + +[PDM is a Python packaging and dependency management tool](https://pdm.fming.dev/latest/). +PDM supports builds for pure Python projects. It also provides multiple layers of +support for projects that have C and C++ extensions. + +```{admonition} PDM support for C and C++ extensions + +PDM supports using the PDM-backend and setuptools at the same time. +This means that you can run setuptools to compile and build C extensions. +PDM's build backend receives the compiled extension files (.so, .pyd) and +packages them with the pure Python files. +``` + +### PDM Features + +```{csv-table} +:header: Feature, PDM, Notes +:widths: 20,5,50 + +Use Other Build Backends, ✅, When you setup PDM it allows you to select from Hatch; PDM-517 and PDM-core build tools. PDM also can work with Meson-Python which supports move complex python builds. +Dependency management & lock files ,✅,PDM and Poetry are currently the only tools that support creating dependency lock files. However their approach to locking files is different: Poetry uses an upper bound lock approach `^`. PDM uses an open lock `>=` Lock files might be most useful to developers creating web apps where locking the environment is critical for consistent user experience. +Select your environment manager of choice (conda; venv; etc),✅ , PDM allows you to select the environment manager that you want to use for managing your package. +Publish to PyPI,✅,PDM supports publishing to both test PyPI and PyPI +Version Control based versioning,✅ , PDM has a setuptools_scm like tool built into it's package which allows you to use dynamic versioning that rely on git tags. +Version bumping, ✅ , PDM supports you bumping the version of your package using standard semantic version terms patch; minor; major +Follows current packaging standards,✅,PDM supports current packaging standards for adding metadata to the **pyproject.toml** file. It also supports pep 517? dependency management which relies upon a local directory containing a users environment. +Install your package in editable mode,✅,PDM supports installing your package in editable mode. **TODO: add info - does it support this and what does that look like? i think it does it when you create an envt??** +Build your SDist and wheel distributions,✅, +✨Optional use of PEP 582 / local environment directory✨,✅, PDM is currently the only tool that optionally supports PEP 582 (havig a local environment configuration stored within a `__packages__` directory in your working package directory). +``` + +```{admonition} PDM vs. Poetry +The functionality of PDM is similar to Poetry. However, PDM also offers +additional, documented support for C extensions and version control based +versioning. If you are deciding between the two tools, the main difference between these two tools +is that Poetry follows strict semantic versioning. Strict adherence to semantic +versioning can be problematic in some cases (more on that below). +``` + +### Challenges with PDM + +PDM is a full-featured packaging tool. However it is not without challenges: + +* Its documentation can be confusing, especially if you are new to +packaging. For example, PDM doesn't provide an end to end beginning workflow in its documentation. +* PDM also only has one maintainer currently. We consider individual maintainer +teams to be a potential risk. If the maintainer finds they no longer have time +to work on the project, it leaves users with a gap in support. Hatch and Flit +also have single maintainer teams. + +[You can view an example of a package that uses PDM here](https://github.com/pyOpenSci/examplePy/tree/main/example4_pdm). The README file for this directly provides you with +an overview of what the PDM command line interface looks like when you use it. + + +## Flit + +[Flit is a no-frills, streamlined packaging tool](https://flit.pypa.io/en/stable/) that supports modern Python packaging standards. +Flit is a great choice if you are +building a basic package to use in a local workflow that doesn't require any advanced features. More on that below. + +### Flit Features + +```{csv-table} +:header: Feature, Flit, Notes +:widths: 20,5,50 + +Publish to PyPI and test PyPI,✅,Flit supports publishing to both test PyPI and PyPI +Helps you add metadata to your pyproject.toml file,✅, . +Follows current packaging standards,✅,Flit supports current packaging standards for adding metadata to the **pyproject.toml** file. +Install your package in editable mode,✅,Flit supports installing your package in editable mode. However it does use a slightly different syntax from the usual `pip install -e .` to do so. +Build your SDist and wheel distributions,✅, . +``` + +```{admonition} Learn more about flit +* [Why use flit?](https://flit.pypa.io/en/stable/rationale.html) +``` + +### Why you might not want to use Flit + +Because Flit is no frills, it is best for basic, quick builds. If you are a +beginner you may want to select Hatch or PDM which will offer you more support +in common operations. +You may NOT want to use flit if: + +* You want to setup more advanced version tracking and management (using version control for version bumping) +* You want a tool that handles dependency versions (use PDM instead) +* You have a project that is not pure Python (Use Hatch, PDM or setuptools) +* Version Support: Flit uses the version from your package's ` __version__`. + +## Hatch + +[**Hatch**](https://hatch.pypa.io/latest/), similar to Poetry and PDM, provides a +unified command line interface. To separate Hatch from Poetry and PDM, it also +provides an environment manager for testing that will make it easier for +you to run tests locally across different versions of Python. It also offers a +nox / makefile like feature that allows you to create custom build workflows such +as building your documentation locally, that you may have created in the past +using a tool like **Make** or **Nox**. + +### Hatch features + +```{csv-table} +:header: Feature, Hatch, Notes +:widths: 20,5,50 + +Use Other Build Backends,✖, Switching out build back ends is not currently an option when using Hatch. However this feature is coming to the package in the near future. +Dependency management,✅,Hatch can help you add dependencies to your `pyproject.toml` metadata. +**??does hatch support this - i forget?** Select your environment manager of choice (conda; venv; etc),✅ , Hatch allows you to select the environment manager that you want to use for managing your package. +Publish to PyPI and test PyPI,✅,Hatch supports publishing to both test PyPI and PyPI +Version Control based versioning,✅ , Hatch offers `hatch_vcs` which is a plugin that uses setuptools_scm to support versioning using git tags. The workflow with `hatch_vcs` is the same as that with `setuptools_scm`. +Version bumping, ✅ , Hatch supports you bumping the version of your package using standard semantic version terms patch; minor; major +Follows current packaging standards,✅,Hatch supports current packaging standards for adding metadata to the **pyproject.toml** file. +Install your package in editable mode,✅,Hatch supports installing your package in editable mode. **TODO: add info - does it support this and what does that look like? i think it does it when you create an envt??** +Build your SDist and wheel distributions,✅, Hatch will build the SDist and wheel distributions +✨Matrix environment creation to support testing across Python versions✨,✅, The matrix environment creation is a feature that is unique to Hatch in the packaging ecosystem. This feature is useful if you wish to test your package locally across Python versions (instead of using a tool such as tox). +✨[Nox / MAKEFILE like functionality](https://hatch.pypa.io/latest/environment/#selection)✨, ✅, This feature is also unique to Hatch. This functionality allows you to create workflows in the **pyproject.toml** configuration to do things like serve docs locally and clean your package build directory. This means you may have one less tool in your build workflow. +✨A flexible build backend: **hatchling**✨, ✅, **The hatchling build back-end offered by the maintainer of Hatch allows developers to easily build plugins to support custom build steps when packaging. + +``` + +_** There is some argument about this approach placing a burden on maintainers to create a custom build system. But others appreciate the flexibility_ + + + +### Why you might not want to use Hatch + +There are a few features that hatch is missing that may be important for some. +These include: + +Hatch: +* Doesn't support dependency pinning +* Currently doesn't support use with other build back ends. Lack of support for other build back ends makes Hatch less desirable for users with more complex package builds. If your package is pure +Python, this won't be an issue. NOTE: there is a plan for this feature to be added in the upcoming months. +* Doesn't allow you to select what environment manager you use. +* Hatch doesn't provide an end to end beginning workflow in it's documentation. +* Hatch, similar to PDM and Flit currently only has one maintainer. + +```{note} +You can customize any aspect of the Hatch build by creating plugins. +``` + +## Poetry + +[Poetry is a full-featured build tool.](https://python-poetry.org/) It is also +the second most popular front-end packaging tool (based upon the PyPA survey). +Poetry is user-friendly and has clean and easy-to-read documentation. + +```{note} +While some have used Poetry for Python builds with C/C++ extensions, this support +is currently undocumented. Thus we don't recommend it for more complex builds. +``` + +### Poetry features + +```{csv-table} +:header: Feature, Poetry, Notes +:widths: 20,5,50 + +Dependency management,✅,Poetry helps you add dependencies to your `pyproject.toml` metadata. _NOTE: currently Poetry adds dependencies using an approach that is slightly out of alignment with current Python peps - however there is a plan to fix this in an upcoming release._ Allows you to organize dependencies in groups: docs; package; tests. +Dependency pinning,✖✅ ,Poetry offers dependency pinning however it does so in a way that can be problematic for some packages. Read below for more. +Select your environment manager of choice (conda; venv; etc),✅ , Poetry allows you to either use its simple environment management tool or select the environment manager that you want to use for managing your package. [Read more about its built in environment management options](https://python-poetry.org/docs/basic-usage/#using-your-virtual-environment). +Publish to PyPI and test PyPI,✅,Poetry supports publishing to both test PyPI and PyPI +Version Control based versioning,✅ , The plugin (Poetry dynamic versioning)[https://github.com/mtkennerly/poetry-dynamic-versioning] supports versioning using git tags with Poetry. +Version bumping, ✅ , Poetry supports you bumping the version of your package using standard semantic version terms patch; minor; major +Follows current packaging standards,✖✅,Poetry does not quite support current packaging standards for adding metadata to the **pyproject.toml** file but plans to fix this in an upcoming release. +Install your package in editable mode,✅,Poetry supports installing your package in editable mode using `--editable` +Build your SDist and wheel distributions,✅, +``` + + +### Challenges with Poetry + +Some challenges of Poetry include: + +* Poetry pins dependencies using an "upper bound" limit specified with the `^` symbol. See breakout below for more regarding why this is potentially problematic. +* Doesn't support version control based versioning +* *Minor:* The way Poetry currently adds metadata to your pyproject.toml file does not does not follow current Python standards. However, this is going to be addressed when they release version 2.0. + +Poetry is an excellent tool. Use caution when pinning dependencies as +Poetry's approach to pinning has been showing to be problematic for many builds. + + + +```{admonition} Challenges with Poetry dependency pinning +:class: important + +Poetry pins dependencies using `^`. This `^` symbol means that there is +an "upper bound" to the dependency. Thus poetry will bump a dependency +version to a new major version. Thus, if your package uses a dependency that +is at version 1.2.3, Poetry will never bump the dependency to 2.0 even if +there is a new major version of the package. Poetry will instead bump up to 1.9.x. + +Poetry does this because it adheres to strict sesmantic versioning which stats +that a major version bump (from 1.0 to 2.0 for example) means there are breaking +changes in the tool. however, not all tools follow strict semantic versioning. +[This approach has been found to be problematic by many of our core scientific packages.](https://iscinumpy.dev/post/bound-version-constraints/) + +This approach also won't support over ways of versioning tools, for instance, +some tools use [calver](https://calver.org/) which creates new versions based on the date. +``` + +```{admonition} Hatch vs PDM vs Poetry +:class: note +There are some features that Hatch and PDM offer that Poetry does not. + +Hatch: offers matrix environment management that allows you to run tests across +Python versions. It also offers a Nox / Make file like tool to streamline your +build workflow. +PDM: does not offer matrix environments of Nox / Makefile like tools. It does +offer dependency management but adheres to a >= approach when pinning. This +avoids the issue described below with Poetry's upper bound pinning. +``` + +## Using Setuptools Back-end for Python Packaging + +[Setuptools](https://setuptools.pypa.io/en/latest/) is the most +mature Python packaging build tool with [development dating back to 2009 and earlier](https://setuptools.pypa.io/en/latest/history.html#). +Setuptools also has the largest number of community users (according to the PyPA +survey). Setuptools does not offer a user +front-end like Flit, Poetry and Hatch offer. As such you will need to use other +tools such as **build** to create +your package distributions and **twine** to publish to PyPI. + +While setuptools is the most commonly used tool, we encourage package maintainers +to consider using a more modern tool for packaging such as Poetry, Hatch or PDM. + +We discuss setuptools here because it's commonly found in the ecosystem and +contributors may benefit from understanding it. + +### Setuptools Features + +Some of features of setuptools include: + +* Fully customizable build workflow +* Many scientific Python packages use it. +* It offers version control based package versioning using **setuptools_scm** +* It supports modern packaging using **pyproject.toml** for metadata +* Supports backwards compatibly for older packaging approaches. + +### Challenges using setuptools + +Setuptools has a few challenges: + +* Because **setuptools** has to maintain backwards compatibility across a range of packages, it is +not as flexible in its adoption of modern Python packaging +standards. +* The above-mentioned backwards compatibility makes for a more complex code-base. +* Your experience as a user will be less streamlined and simple using setuptools compared to other tools discussed on this page. + +There are also some problematic default settings that users should be aware of +when using setuptools. For instance: + +* setuptools will build a project without a name or version if you are not using a **pyproject.toml** file +to store metadata. +*Setuptools also will include all of the files in your package +repository if you do not explicitly tell it to exclude files using a +**MANIFEST.in** file + + + + + + + + + diff --git a/package-structure-code/python-package-distribution-files-sdist-wheel.md b/package-structure-code/python-package-distribution-files-sdist-wheel.md new file mode 100644 index 000000000..43e0ca715 --- /dev/null +++ b/package-structure-code/python-package-distribution-files-sdist-wheel.md @@ -0,0 +1,182 @@ +# The Python Package Source and Wheel Distributions + +There are two core distribution files +that you need to create to publish your Python package to +PyPI SDist and Wheel. The SDist (Source Distribution) contains the raw source +code for your package. The Wheel (.whl) contains the built / compiled files +that can be directly installed onto anyones' computer. + +Learn more about both distributions below. + +```{note} +If your package is a pure python package with no additional +build / compilation steps then the SDist and Wheel distributions will have +similar content. However if your package has extensions in other languages +or is more complex in its build, the two distributions will be very different. + +Also note that we are not discussing conda build workflows in this section. +[You can learn more about conda builds here.](https://conda.io/projects/conda-build/en/latest/user-guide/tutorials/index.html) +``` + +### Source Distribution (SDist) + +**Source files** are the unbuilt files needed to build your +package. These are the "raw / as-is" files that you store on GitHub or whatever +platform you use to manage your code. + +**S**ource **D**istributions are referred to as SDist. As the name implies, a SDIST contains the source code; it has not been +built or compiled in any way. Thus, when a user installs your source +distribution using pip, pip needs to run a build step first. SDist is normally stored as a `.tar.gz` archive (often called a "tarball"). + +Below is an example SDist for the stravalib Python package: + + + +``` +stravalib-1.1.0.post2-SDist.tar.gz file contents + +├─ 📂 stravalib +│ ├─ tests +│ │ ├─ integration +│ │ │ ├─ __init__.py +│ │ │ ├─ conftest.py +│ │ │ ├─ strava_api_stub.py +│ │ │ └─ test_client.py +│ │ ├─ unit +│ │ │ ├─ __init__.py +│ │ │ ├─ test_attributes.py +│ │ │ ├─ ... +│ │ ├─ __init__.py +│ │ ├─ auth_responder.py +│ │ └─ test.ini-example +│ ├─ util +│ │ ├─ __init__.py +│ │ └─ limiter.py +│ ├─ __init__.py +│ ├─ _version.py +│ ├─ _version_generated.py +│ ├─ attributes.py +│ ├─ ... +├─ stravalib.egg-info +│ ├─ PKG-INFO +│ ├─ SOURCES.txt +│ ├─ dependency_links.txt +│ ├─ requires.txt +│ └─ top_level.txt +├─ CODE_OF_CONDUCT.md +├─ CONTRIBUTING.md +├─ LICENSE.txt +├─ MANIFEST.in +├─ Makefile +├─ PKG-INFO +├─ README.md +├─ changelog.md +├─ environment.yml +├─ pyproject.toml +├─ requirements-build.txt +├─ requirements.txt +└─ setup.cfg + +``` + +```{admonition} GitHub archive vs SDist +:class: tip +When you make a release on GitHub, it creates a `git archive` that contains all +of the files in your GitHub repository. While these files are similar to an +SDist, these two archives are not the same. The SDist contains a few other +items including a metadata directory and if you use `setuptools_scm` or `hatch_vcs` +the SDist will also contain a `_version.py` file. +``` + + + +### Wheel (.whl files): + +A wheel file is a ZIP-format archive whose filename follows a specific format +(below) and has the extension `.whl`. The `.whl` archive contains a specific +set of files, including metadata that are generated from your project's +pyproject.toml file. The pyproject.toml and other files that may be included in +source distributions are not included in wheels because it is a built +distribution. + +The wheel (.whl) is your built binary distribution. **Binary files** are the built / compiled source files. These files are ready to be installed. A wheel (**.whl**) is a **.zip** file containing all of the files needed to directly install your package. All of the files in a wheel are binaries - this means that code is already compiled / built. Wheels are thus faster to install - particularly if you have a package that requires build steps. + + The wheel does not contain any of your +packages configuration files such as **setup.cfg** or **pyproject.toml**. This +distribution is already built so it's ready to install. + +Because it is built, the wheel file will be faster to install for pure Python +projects and can lead to consistent installs across machines. + + + + + +```{tip} +Wheels are also useful in the case that a package +needs a **setup.py** file to support a more complex build. +In this case, because the files in the wheel bundle +are pre built, the user installing doesn't have to +worry about malicious code injections when it is installed. +``` + +The filename of a wheel contains important metadata about your package. + +Example: **stravalib-1.1.0.post2-py3-none.whl** + +* packageName: stravalib +* packageVersion: 1.1.0 +* build-number: 2 (post2) [(read more about post here)](https://peps.python.org/pep-0440/#post-release-separators) +* py3: supports Python 3.x +* none: is not operating system specific (runs on windows, mac, linux) +* any: runs on any computer processor / architecture + +What a wheel file looks like when unpacked (unzipped): + +``` +stravalib-1.1.0.post2-py3-none.whl file contents: + +├─ 📂 stravalib +│ ├─ tests +│ │ ├─ functional +│ │ │ ├─ __init__.py +│ │ │ ├─ test_client.py +│ │ ├─ unit +│ │ │ ├─ __init__.py +│ │ │ ├─ test_attributes.py +│ │ ├─ __init__.py +│ │ ├─ auth_responder.py +│ │ └─ test.ini-example +│ ├─ util +│ │ ├─ __init__.py +│ │ └─ limiter.py +│ ├─ __init__.py +│ ├─ _version.py +│ ├─ _version_generated.py +│ ├─ attributes.py +│ ├─ client.py +└─ stravalib-1.1.0.post2.dist-info # Package metadata are stored here + ├─ LICENSE.txt + ├─ METADATA + ├─ RECORD + ├─ WHEEL + └─ top_level.txt + +``` + +```{tip} +[Read more about the wheel format here](https://pythonwheels.com/) +``` diff --git a/package-structure-code/python-package-structure.md b/package-structure-code/python-package-structure.md new file mode 100644 index 000000000..5bff58d49 --- /dev/null +++ b/package-structure-code/python-package-structure.md @@ -0,0 +1,222 @@ +# Python Package Structure for Scientific Python Projects + +## Directories that should be in all Python packages + +There are several core directories that should be included in all Python packages: + +* **docs/:** discussed in our docs chapter, this directory contains your user-facing documentation website +* **tests/** this directory contains the tests for your project code +* **package-name/**: this is the directory that contains the code for your Python project. It is normally named using your project's name. + +Every Python package should have all 3 of the above directories within it. + +## Src vs flat layouts +There are two different layouts that you will commonly see +within the Python packaging ecosystem: +[src and flat layouts.](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/) +Both layouts have advantages for different groups of maintainers. + +The **src/package-name** approach nests your **package-name** directory, mentioned above as the directory where your code lives, into a **src/** directory like this: + +**src/package-name** + +In a flat layout approach, your package's code lives in a **package-name** directory +at the root of your package's repository. + +On this page we: + +1. Suggest the **src/package-name** layout structure for new packages. This layout prevents some commonly found issues with the flat layout (discussed below) +2. Introduce the flat layout as it is used in the scientific ecosystem. Currently this layout is the most common. As such it's good to be familiar with it in case you contribute to a package using a flat layout in the future! Or, perhaps +you maintain one now! + +```{admonition} pyOpenSci will never require a specific package structure for peer review +:class: important + +We understand that it would be tremendous effort for existing +maintainers to move to a new layout. + +The overview on this page presents recommendations that we think are best for +something getting started with Python packaging or someone who's package is +has a simple build and might be open to moving to a more fail-proof approach. +``` + +## The src/ layout for Python packages + +The **src/package-name** layout is the approach that we suggest +for new maintainers. It is also recommended in the +[PyPA packaging guide](https://packaging.python.org/en/latest/tutorials/packaging-projects/). We suggest the **src/package-name** layout because it +makes it easier for you to create a package build workflow that tests your +package as it will be installed on a users computer. + +The key characteristic of this layout is that your package +uses a **src/package-name** directory structure. With this layout it is also +common to include your `tests/` directory outside of the package +directory. However, you may see some packages +that includes tests within the **src/package-name** directory. + +```{admonition} Example scientific packages that use **src/package-name** layout + +* [Sourmash](https://github.com/sourmash-bio/sourmash) +* [bokeh](https://github.com/bokeh/bokeh) +* [openscm](https://github.com/openscm/openscm-runner) +* [awkward](https://github.com/scikit-hep/awkward) +``` + +#### Pros of the src/ layout + +The benefits of the **src/package-name** layout approach include: + +* It ensures that tests always run on the installed version of your +package rather than on the flat files imported directly from your package. If you run your tests on your flat files, you may be missing issues that users may encounter with your package when it's installed. +* If `tests/` are outside of the **src/package-name** directory, they aren't by default +delivered to a user +installing your package. However, you can chose to add them to the package archive. When tests are not included in the package archive your package size will be slightly smaller. +* The **src/package-name** layout is semantically more clear. Code is always found in the +**src/package-name** directory, `tests/` and `docs/`are in the root directory. + +```{tip} +* [Read more about reasons to use the **src/package-name** layout](https://hynek.me/articles/testing-packaging/) +``` + +An example of the **src/package-name** layout structure can be seen below. + +``` +myPackage +├── CHANGELOG.md ┐ +├── CODE_OF_CONDUCT.md │ +├── CONTRIBUTING.md │ +├── docs │ Package documentation +│ └── index.md +│ └── ... │ +├── LICENSE │ +├── README.md ┘ +├── pyproject.toml ┐ +├── src │ +│ └── myPackage │ Package source code, metadata, +│ ├── __init__.py │ and build instructions +│ ├── moduleA.py │ +│ └── moduleB.py ┘ +└── tests ┐ + └── ... ┘ Package tests +``` + +## Core file requirements for a Python package + +In the above example, notice that all of the core documentation files that +pyOpenSci requires live in the root of your project directory. These files +include: + +* CHANGELOG.md +* CODE_OF_CONDUCT.md +* CONTRIBUTING.md +* LICENSE.txt +* README.md + +Also note that there is a **docs/** directory at the root where your user-facing +documentation website lives. + +```{button-link} https://www.pyopensci.org/python-package-guide/documentation +:color: primary +:class: sd-rounded-pill + +Click here to read about our packaging documentation requirements. +``` + + +```{important} +If your package tests require data, we suggest that you do NOT include that +data within your package structure. We will discuss this in more detail in a +tutorial. Include data in your package structure increases the size of your +distribution files. This places a maintenance toll on repositories like PyPI and +anaconda cloud that have to deal with thousands of package uploads. +``` + +## About the flat Python package layout + +Currently most scientific packages use the **flat-layout** given: + +* It's the most commonly found layout with the scientific Python ecosystem and +people tend to look to other packages / maintainers that they respect for examples +of how to build Python packages. +* Many Python tools depend upon tools in other language and / or complex builds +with compilation steps. Many developers thus appreciate / are used to features +of the flat layout. + +While we present this layout here in our guide, we suggest that those just +getting started with python packaging start with the src/package-name layout +discussed above. Numerous packages in the ecosystem [have had to move to a +src/ layout](https://github.com/scikit-build/cmake-python-distributions/pull/145) + + +```{admonition} Why most scientific Python packages do not use source +:class: tip + +In most cases the advantages of using the **src/package-name** layout for +larger scientific packages that already use flat approach are not worth it. +Moving from a flat layout to a **src/package-name** layout would come at a significant cost to +maintainers. + +However, the advantages of using the **src/package-name** layout for a beginner are significant. +As such, we recommend that if you are getting started with creating a package, +that you consider using a **src/package-name** layout. +``` + +## What does the flat layout structure look like? + +The flat layout's primary characteristics are: + +* The source code for your package lives in a directory with your package's +name in the root of your directory +* Often the `tests/` directory also lives within that same `package-name` directory. + +Below you can see the recommended structure of a scientific Python package +using the flat layout. + +```bash +myPackage/ +├── CHANGELOG.md ┐ +├── CODE_OF_CONDUCT.md │ +├── CONTRIBUTING.md │ +├── docs/ │ Package documentation +│ └── ... │ +├── LICENSE │ +├── README.md ┘ +├── pyproject.toml ] Package metadata and build configuration +| myPackage/ ┐ +│ ├── __init__.py │ Package source code +│ ├── moduleA.py │ +│ └── moduleB.py ┘ + tests/ ┐ + └── test-file1.py | Package tests + └── .... ┘ +``` + + +### Benefits of using the flat layout in your Python package + +There are some benefits to the scientific community in using the flat layout. + +* This structure has historically been used across the ecosystem and packages +using it are unlikely to change. +* You can directly import the package directly from the root directory. For +some this is engrained in their respective workflows. However, for a beginner +the danger of doing this is that you are not developing and testing against the +installed version of your package. Rather, you are working directly with the +flat files. + + +```{admonition} Core scientific Python packages that use the flat layout +:class: tip + +* [numpy](https://github.com/numpy/numpy) +* [scipy](https://github.com/scipy/scipy) +* [pandas](https://github.com/pandas-dev/pandas) +* [xarray](https://github.com/pydata/xarray) +* [Jupyter-core](https://github.com/jupyter/jupyter_core) +* [Jupyter notebook](https://github.com/jupyter/notebook) +* [scikit-learn](https://github.com/scikit-learn/scikit-learn) + +It would be a significant maintenance cost and burden to move all of these +packages to a different layout. The potential benefits of the source layout +for these tools is not worth the maintenance investment. +```