-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[ci] Setup Release pipeline and build release wheels with cache #5610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
469fe03
b1c1a57
ca75c10
7f7e384
d45f716
fbee97f
f89896d
65b6854
3671e80
c66fea1
d29ba81
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| steps: | ||
| - block: "Build wheels" | ||
|
|
||
| - label: "Build wheel - Python {{matrix.python_version}}, CUDA {{matrix.cuda_version}}" | ||
| agents: | ||
| queue: cpu_queue | ||
| commands: | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION={{matrix.cuda_version}} --build-arg PYTHON_VERSION={{matrix.python_version}} --tag vllm-ci:build-image --target build --progress plain ." | ||
| - "mkdir artifacts" | ||
| - "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image cp -r dist /artifacts_host" | ||
| - "aws s3 cp --recursive artifacts/dist s3://vllm-wheels/$BUILDKITE_COMMIT/" | ||
| matrix: | ||
| setup: | ||
| cuda_version: | ||
| - "11.8.0" | ||
| - "12.1.0" | ||
| python_version: | ||
| - "3.8" | ||
| - "3.9" | ||
| - "3.10" | ||
| - "3.11" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,9 +5,26 @@ | |
| # docs/source/dev/dockerfile/dockerfile.rst and | ||
| # docs/source/assets/dev/dockerfile-stages-dependency.png | ||
|
|
||
| ARG CUDA_VERSION=12.4.1 | ||
| #################### BASE BUILD IMAGE #################### | ||
| # prepare basic build environment | ||
| FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 AS dev | ||
| FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 AS base | ||
|
|
||
| ARG CUDA_VERSION | ||
| ENV CUDA_VERSION=${CUDA_VERSION} | ||
| ARG PYTHON_VERSION=3 | ||
| ENV PYTHON_VERSION=${PYTHON_VERSION} | ||
|
|
||
| ENV DEBIAN_FRONTEND=noninteractive | ||
|
|
||
| RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \ | ||
| && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debconf-set-selections \ | ||
| && apt-get update -y \ | ||
| && apt-get install -y ccache software-properties-common \ | ||
| && add-apt-repository ppa:deadsnakes/ppa \ | ||
| && apt-get update -y \ | ||
| && apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv python3-pip \ | ||
| && if [ "${PYTHON_VERSION}" != "3" ]; then update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1; fi | ||
|
|
||
| RUN apt-get update -y \ | ||
| && apt-get install -y python3-pip git curl sudo | ||
|
|
@@ -16,22 +33,15 @@ RUN apt-get update -y \ | |
| # https://github.com/pytorch/pytorch/issues/107960 -- hopefully | ||
| # this won't be needed for future versions of this docker image | ||
| # or future versions of triton. | ||
| RUN ldconfig /usr/local/cuda-12.4/compat/ | ||
| RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/ | ||
|
|
||
| WORKDIR /workspace | ||
|
|
||
| # install build and runtime dependencies | ||
| COPY requirements-common.txt requirements-common.txt | ||
| COPY requirements-cuda.txt requirements-cuda.txt | ||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| pip install -r requirements-cuda.txt | ||
|
|
||
| # install development dependencies | ||
| COPY requirements-lint.txt requirements-lint.txt | ||
| COPY requirements-test.txt requirements-test.txt | ||
| COPY requirements-dev.txt requirements-dev.txt | ||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| pip install -r requirements-dev.txt | ||
| python${PYTHON_VERSION} -m pip install -r requirements-cuda.txt | ||
|
||
|
|
||
| # cuda arch list used by torch | ||
| # can be useful for both `dev` and `test` | ||
|
|
@@ -41,14 +51,16 @@ ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX' | |
| ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list} | ||
| #################### BASE BUILD IMAGE #################### | ||
|
|
||
|
|
||
| #################### WHEEL BUILD IMAGE #################### | ||
| FROM dev AS build | ||
| FROM base AS build | ||
|
|
||
| ARG PYTHON_VERSION=3 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does multi-stage dockerifle support global arg?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just as a fyi: it sort of does. What you have to do is define all your desired global |
||
| ENV PYTHON_VERSION=${PYTHON_VERSION} | ||
|
|
||
| # install build dependencies | ||
| COPY requirements-build.txt requirements-build.txt | ||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| pip install -r requirements-build.txt | ||
| python${PYTHON_VERSION} -m pip install -r requirements-build.txt | ||
|
|
||
| # install compiler cache to speed up compilation leveraging local or remote caching | ||
| RUN apt-get update -y && apt-get install -y ccache | ||
|
|
@@ -84,15 +96,15 @@ RUN --mount=type=cache,target=/root/.cache/pip \ | |
| && export SCCACHE_BUCKET=vllm-build-sccache \ | ||
| && export SCCACHE_REGION=us-west-2 \ | ||
| && sccache --show-stats \ | ||
| && python3 setup.py bdist_wheel --dist-dir=dist \ | ||
| && python${PYTHON_VERSION} setup.py bdist_wheel --dist-dir=dist \ | ||
| && sccache --show-stats; \ | ||
| fi | ||
|
|
||
| ENV CCACHE_DIR=/root/.cache/ccache | ||
| RUN --mount=type=cache,target=/root/.cache/ccache \ | ||
| --mount=type=cache,target=/root/.cache/pip \ | ||
| if [ "$USE_SCCACHE" != "1" ]; then \ | ||
| python3 setup.py bdist_wheel --dist-dir=dist; \ | ||
| python${PYTHON_VERSION} setup.py bdist_wheel --dist-dir=dist; \ | ||
| fi | ||
|
|
||
| # check the size of the wheel, we cannot upload wheels larger than 100MB | ||
|
|
@@ -101,9 +113,20 @@ RUN python3 check-wheel-size.py dist | |
|
|
||
| #################### EXTENSION Build IMAGE #################### | ||
|
|
||
| #################### DEV IMAGE #################### | ||
| FROM base as dev | ||
|
|
||
| COPY requirements-lint.txt requirements-lint.txt | ||
| COPY requirements-test.txt requirements-test.txt | ||
| COPY requirements-dev.txt requirements-dev.txt | ||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| python${PYTHON_VERSION} -m pip install -r requirements-dev.txt | ||
|
|
||
| #################### DEV IMAGE #################### | ||
|
|
||
| #################### vLLM installation IMAGE #################### | ||
| # image with vLLM installed | ||
| FROM nvidia/cuda:12.4.1-base-ubuntu22.04 AS vllm-base | ||
| FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04 AS vllm-base | ||
| WORKDIR /vllm-workspace | ||
|
|
||
| RUN apt-get update -y \ | ||
|
|
@@ -113,7 +136,7 @@ RUN apt-get update -y \ | |
| # https://github.com/pytorch/pytorch/issues/107960 -- hopefully | ||
| # this won't be needed for future versions of this docker image | ||
| # or future versions of triton. | ||
| RUN ldconfig /usr/local/cuda-12.4/compat/ | ||
| RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/ | ||
|
|
||
| # install vllm wheel first, so that torch etc will be installed | ||
| RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist \ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this repeated? Declared in line 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one on line 8 is to declare
CUDA_VERSIONsoFROM nvidia/cuda:${CUDA_VERSION}can reference it. This is defined within the build stage so other steps can reference toCUDA_VERSION