This repository provides rules for the Bazel build tool that allow your Python code to depend on pip packages using a standard requirements file. It utilizes Starlark wherever possible to provide hermetic rules, but uses Python and the pip library for dependency resolution and management in order to ensure that the development experience is consistent with that of pip.
This repository is designed to be support the use of both Python 2 and Python 3 in a single repo, execution on multiple platforms, and remote execution.
Add the following to your WORKSPACE file:
git_repository(
    name = "com_apt_itude_rules_pip",
    commit = "82d3393646982b233c7ed919a5b66ffb471dc649",
    remote = "https://github.com/apt-itude/rules_pip.git",
)
load("@com_apt_itude_rules_pip//rules:dependencies.bzl", "pip_rules_dependencies")
pip_rules_dependencies()All direct external pip dependencies should be defined in a requirements.txt file. This file should be located in a Bazel package (typically something like thirdparty/pip) and committed to source control.
The same requirements file may be used to define all pip dependencies accross your workspace and shared among Python versions and platforms. To define requirements that only pertain to a particular Python version or platform, you should use Environment Markers.
NOTE: At this time, only the sys_platform and python_version environment markers are supported. Using additional markers will not cause an error, but the pip_lock rule only generates one environment per major Python version per platform, so additional granularity in the requirements.txt file will be ignored.
Instantiate the pip_lock rule in a BUILD file, typically within the same package as the requirements.txt file. For example:
thirdparty/pip/BUILD
load("@com_apt_itude_rules_pip//rules:lock.bzl", "pip_lock")
pip_lock(
    name = "lock",
    requirements = ["requirements.txt"],
)Next, execute the pip_lock binary:
bazel run //thirdparty/pip:lockThis will do two things:
- Generate a requirements-lock.jsonfile alongside therequirements.txtfile, which locks all direct and transitive dependencies to a specific version
- Build wheels for any requirement that is not already distributed as a wheel on PyPI and store them in a wheelsdirectory alongside therequirements.txtfile
The requirements-lock.json file should be committed to source control.
The wheel files may either be source-controlled or published to a custom Python package index. See Managing built wheel files for more information.
Add the following to your WORKSPACE file:
load("@com_apt_itude_rules_pip//rules:repository.bzl", "pip_repository")
pip_repository(
    name = "pip",
    requirements = "//thirdparty/pip:requirements-lock.json",
)
load("@pip//:requirements.bzl", "pip_install")
pip_install()This creates an external workspace named @pip from which you can access all pip requirements. A py_library target is exposed for each requirement via the label @pip//<distro_name>, where <distro_name> is the canonical name of the Python distribution found in your requirements.txt file. The canonical name is all lowercase, with hyphens replaced by underscores. For example, PyYAML would become @pip//pyyaml and pytest-mock would become @pip//pytest_mock.
Simply add @pip//<distro_name> labels to the deps list of any py_library, py_binary, or py_test rule. For example:
some/package/BUILD
py_library(
    name = "dopecode"
    srcs = ["dopecode.py"],
    deps = [
        "@pip//pytest_mock",
        "@pip//pyyaml",
    ]
)When you build these targets, only the pip distributions that are directly or indirectly required by that target will be fetched.
To update all requirements simultaneously:
bazel run <label/of:pip_lock> -- -UTo update one or more requirements individually:
bazel run <label/of:pip_lock> -- -P <package-one> -P <package-two>There are two options for managing the wheels that are built when executing the pip_lock rule:
This is the simplest option. If you simply leave them in the directory in which they were built and commit them to source control, there is nothing more to do. However, this has the downside of bloating your repository with binary files, which can make operations like cloning and pulling take longer if there are large wheels or a large number of wheels. If you choose this option and you are using git, you may want to consider using Git LFS for this.
If you have the resources, you can also create a custom Python package index in which to host the wheels. Once you publish the wheels to the index, add the index URL as an argument to the pip_lock rule. For example:
pip_lock(
    name = "lock",
    requirements = ["requirements.txt"],
    args = ["--index-url", "https://custom.pypi.com"],
)Then, re-run the pip_lock binary:
bazel run //thirdparty/pip:lockThis will update the requirements-lock.json file to point to the remote wheel files and delete the unused local wheel files. Make sure you commit the updated lock file to source control.
Note that the --index-url argument is additive, so you may use it any number of times to expose multiple package indices.
TL;DR Pre-building wheels is an unfortunate but necessary compromise to make Bazel and pip play nicely.
PyPI provides two types of distributions: wheels and source distributions.
Wheels are pre-built archives that can simply be extracted onto the PYTHONPATH and used out of the box. This makes them ideal for use within Bazel because a repository rule can simply download the wheel file, extract it, and create a static BUILD file that globs for all the extracted files as sources to a py_library rule.
Source distributions, on the other hand, are archives of the source code, which means that their installation requires a build step. Pure Python code does not require building, but packages that contain C extensions, for example, must be compiled. For various reasons, often due to platform differences, many Python packages are only available as source distributions on PyPI (check out this website for availability of many popular distributions as wheels).
Creating py_library targets for source distributions can be accomplished in any of the following ways:
This is the easiest solution because it's the way pip normally works, which is why the original rules_pip rules did it this way. However, there are three major issues with this approach:
- Bazel provides very little visibility into the fetch process for users, so if a source distribution fails to build for any reason, users are left fumbling around in the dark and figuring out ways of manually executing the rules_piptools in order to reproduce the issue.
- There is no way to utilize the Bazel Python toolchain at this stage, so in order for the wheels to be built with the correct Python version(s), the system Python interpreter(s) must exactly match your toolchain. This defeats the purpose of using Bazel toolchains in the first place.
- Fetching external dependencies happens on the host platform (i.e. the platform on which Bazel is running), so this approach will only build the wheels for the host platform. If any wheels are platform-specific, this makes remote execution on a different platform impossible.
This would be the ideal solution because it's the most Bazel-y solution: start with source code and hermetically build on a target platform. However, this is extremely difficult to accomplish because of how setuptools works.
Since all Bazel rules are hermetic, the exact set of input and output files for each rule must be known during the loading phase, before anything has been built. This means that in order for a py_library, py_binary or py_test rule to depend on a source distribution, a complete and explicit list of of the files that would be produced by building that source distribution must be known ahead of time. Unfortunately, a distribution's setup.py file may execute arbitrary Python code at build time, which means that it's impossible (or at least way more difficult than I deem worthwhile) to know the outputs it will produce without actually building it.
This is the happy(ish) medium between the previous two approaches and is what rules_pip now expects. If all external pip dependencies are already wheels, then there is no build step to perform at either fetch time or execution time.
Obviously, it's not practical to expect that everything is distributed as a wheel or that developers will limit themselves to exclusively using packages that are distributed as wheels, so the pip_lock rule fills that gap. It provides the mechanism for building wheels in such a way that will both utilize your Python toolchain and work in a remote execution environment.
Although executing the binary produced by pip_lock rule is not hermetic, it is consistent with the way pip itself builds wheels because it actually uses the pip library.
If you are utilizing the original compile_pip_requirements and pip_repository rules, you can easily transition to the new ruleset.
In order to keep the original rules functional while migrating to the new ones, import rules_pip under a different external repository name and shadow any conflicting rule names. For example:
WORKSPACE
git_repository(
    name = "com_apt_itude_rules_pip_new",
    commit = "82d3393646982b233c7ed919a5b66ffb471dc649",
    remote = "https://github.com/apt-itude/rules_pip.git",
)
load(
    "@com_apt_itude_rules_pip_new//rules:dependencies.bzl",
    new_pip_rules_dependencies="pip_rules_dependencies",
)
new_pip_rules_dependencies()If you had separate requirements.in files for Python 2 and Python 3, you may continue to keep them separate and simply pass both to the pip_lock rule, like so:
load("@com_apt_itude_rules_pip//rules:lock.bzl", "pip_lock")
pip_lock(
    name = "lock",
    requirements = ["requirements-2.in", "requirements-3.in"],
)However, you must add the python_version < "3.0" or python_version >= "3.0" environment marker to every line of requirements-2.in and requirements-3.in, respectively, in order to tell the pip_lock rule which Python version each requirement belongs to.
If the two lists have a lot of overlap, it may be simpler to combine all requirements into a single requirements.txt file and selectively use the python_version environment marker only for those requirements that should be limited to one version.
Follow the steps outlined in Locking your requirements and Turning the requirements into Bazel dependencies.
If you previously instantiated two pip_repository rules named pip2 and pip3, simply find and replace all instances of @pip2 and @pip3 in your workspace with @pip.
Delete any usages of the compile_pip_requirements rule as well as any requirements.txt files that they produced. In addition, remove the load statement of the original com_apt_itude_rules_pip repo and the associated pip_rules_dependency instantiation from your WORKSPACE file.
Finally, you may rename com_apt_itude_rules_pip_new and new_pip_rules_dependencies in your WORKSPACE file to the canonical com_apt_itude_rules_pip and pip_rules_dependencies.