Add runai model streamer e2e test for GCS #28079

amacaskill · 2025-11-04T23:42:32Z

Purpose

The purpose of this is to add a basic RunAI model streamer e2e test that pulls from a public GCS bucket, and add this to the CI pipeline, to prevent regressions in the RunAI model streamer. Examples of past RunAI model streamer regressions that this e2e test would have caught is:

[Speculators] Move tests + fix integration #27308
feat: Enable engine-level arguments with speculators models #25250
Issue [Bug]: Starting from v0.10.1 model can't be loaded from s3 #24313 (can't find PR that caused it). I believe fix was [Bugfix] when use s3 model cannot use default load_format #24435.

We also need code coverage for vllm/config/model.py

This test uses a small model, codegemma-2b which is around 5.6 GiB.

Test Plan

We plan to enable all of the RunAI model streamer tests in the CI pipeline. Particularly, we want to run the tests on changes to any of the following:

  source_file_dependencies:
  - vllm/engine
  - vllm/model_executor/model_loader
  - tests/model_executor/model_loader/runai_model_streamer

We also want to run the tests on the torch_nightly.

We didn't add this to Model Executor Test because we would like to run the RunAI model streamer tests on changes to vllm/engine , tests/model_executor/runai_model_streamer, and on the nightly build (which Model Executor Test doesn't currently do).

Test Result

# All runai_model_streamer tests pass (including new test I added)
pytest -s -v tests/model_executor/model_loader/runai_model_streamer
=========================================== 7 passed, 3 warnings in 76.94s (0:01:16) ===========================================

# The new test I added passes: 
pytest -s -v tests/model_executor/model_loader/runai_model_streamer/test_runai_model_streamer_loader.py::test_runai_model_loader_download_gcs_files
PASSED
======================================================= warnings summary =======================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /root/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================ 1 passed, 3 warnings in 47.99s ================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
(vllm) root@vllm-g2-vm:~/vllm#

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-11-04T23:42:42Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request adds an end-to-end test for the RunAI model streamer using a model from a public GCS bucket. The change is straightforward and helps prevent regressions. I've suggested strengthening the test's assertion to make it more robust in catching potential issues.

tests/model_executor/model_loader/runai_model_streamer/test_runai_model_streamer_loader.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

.buildkite/test-pipeline.yaml

amacaskill · 2025-11-05T00:24:56Z

Hi @22quinn @rahul-tuli @DarkLight1337 Would one of you be able to review this PR?

.buildkite/test-pipeline.yaml

amacaskill · 2025-11-05T17:06:38Z

@DarkLight1337 I have another question. Say there is a refactor of the RunAI model streamer, so now logic that could break the streamer, is in another file (file X), and file X isn't listed in source_file_dependencies. If someone breaks the RunAI model streamer logic in file X, then the runai model streamer test will not be run on the PR presubmit, and it will run in the torch nightly build for the first time, and fail.

When a test fails in torch nightly, what happens? Who is responsible for fixing that failure? Does the failure block the nightly build and/or the next vllm release? Or does vllm just proceed with the release/build, with that feature broken?

DarkLight1337 · 2025-11-05T17:14:49Z

When a test fails in torch nightly, what happens? Who is responsible for fixing that failure? Does the failure block the nightly build and/or the next vllm release? Or does vllm just proceed with the release/build, with that feature broken?

Nightly failures are not blocking, but we try to fix as many as possible before releasing.

amacaskill · 2025-11-05T18:49:17Z

New test test_runai_model_loader_download_files_gcs, fails in CI because google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information. This didn't happen for me locally, but probably because I have ADC by default on my GCE VM. To revoke the ADC for my GCE VM, I recreated my VM with no-service-account so it doesn't use the GCE Compute engine default SA (which plugs into ADC). This happens within _create_client() in the RunAI model streamer repo.

To fix this, I tried adding monkeypatch.setenv("RUNAI_STREAMER_GCS_USE_ANONYMOUS_CREDENTIALS", "true") to the test, but this failed because in storage.Client() , still requires a project and other things, even when it is passed an anonymous credentials. To fix this, I think I need to change _create_client(), to return _create_anonymous_client() if credentials.credential_type == CredentialType.ANONYMOUS_CREDENTIALS

def _create_client() -> storage.client.Client:
    credentials = get_credentials()
    if credentials.credential_type == CredentialType.ANONYMOUS_CREDENTIALS:
        return storage.Client.create_anonymous_client()
    return storage.Client(credentials=credentials.gcp_credentials())

Tested this out locally, but this fails in weird place:

(EngineCore_DP0 pid=16176)   File "/root/vllm/vllm/model_executor/model_loader/weight_utils.py", line 663, in runai_safetensors_weights_iterator
(EngineCore_DP0 pid=16176)     streamer.stream_files(hf_weights_files)
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/safetensors_streamer/safetensors_streamer.py", line 78, in stream_files
(EngineCore_DP0 pid=16176)     safetensors_metadatas = safetensors_pytorch.prepare_request(self.file_streamer, paths, s3_credentials)
(EngineCore_DP0 pid=16176)                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 105, in prepare_request
(EngineCore_DP0 pid=16176)     safetensors_metadatas = SafetensorsMetadata.from_files(fs, paths, s3_credentials)
(EngineCore_DP0 pid=16176)                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 58, in from_files
(EngineCore_DP0 pid=16176)     for file_index, ready_chunk_index, buffer in fs.get_chunks():
(EngineCore_DP0 pid=16176)                                                  ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/distributed_streamer/distributed_streamer.py", line 141, in get_chunks
(EngineCore_DP0 pid=16176)     for item in self.file_streamer.get_chunks():
(EngineCore_DP0 pid=16176)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/file_streamer/file_streamer.py", line 121, in get_chunks
(EngineCore_DP0 pid=16176)     yield from self.request_ready_chunks()
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/file_streamer/file_streamer.py", line 142, in request_ready_chunks
(EngineCore_DP0 pid=16176)     file_relative_index, chunk_relative_index = runai_response(self.streamer)
(EngineCore_DP0 pid=16176)                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/libstreamer/libstreamer.py", line 93, in runai_response
(EngineCore_DP0 pid=16176)     raise Exception(
(EngineCore_DP0 pid=16176) Exception: Could not receive runai_response from libstreamer due to: b'File access error'

Maybe this is because the C++ code doesn't support the anonymous GCS access? I see there is a TODO. Also tried setting up a fake RUNAI_STREAMER_GCS_CREDENTIAL_FILE, but the google auth library was too smart and it failed earlier. I think I either need to (1) fix the C++ code to support anonymous GCS access or (2) Change the test to setup the ADC credentials and set GOOGLE_APPLICATION_CREDENTIALS. Leaning toward (2)

Signed-off-by: Alexis MacAskill <[email protected]>

amacaskill · 2025-11-07T02:40:45Z

Maybe this is because the C++ code doesn't support the anonymous GCS access? I see there is a TODO. Also tried setting up a fake RUNAI_STREAMER_GCS_CREDENTIAL_FILE, but the google auth library was too smart and it failed earlier. I think I either need to (1) fix the C++ code to support anonymous GCS access or (2) Change the test to setup the ADC credentials and set GOOGLE_APPLICATION_CREDENTIALS. Leaning toward (2)

After much trial and error, I fixed the test to correctly use the anonymous credentials, and now the test is passing

Signed-off-by: Alexis MacAskill <[email protected]>

amacaskill force-pushed the runai-streamer-tests-2 branch from b2996e3 to 9f5e8a1 Compare November 4, 2025 23:42

mergify bot added the ci/build label Nov 4, 2025

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

tests/model_executor/model_loader/runai_model_streamer/test_runai_model_streamer_loader.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 4, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

amacaskill force-pushed the runai-streamer-tests-2 branch 6 times, most recently from 5f499c9 to 89c9529 Compare November 5, 2025 00:06

DarkLight1337 reviewed Nov 5, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

amacaskill force-pushed the runai-streamer-tests-2 branch from 89c9529 to a55cd5c Compare November 5, 2025 17:00

DarkLight1337 approved these changes Nov 5, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 5, 2025 17:32

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025

auto-merge was automatically disabled November 5, 2025 19:32
Head branch was pushed to by a user without write access

amacaskill force-pushed the runai-streamer-tests-2 branch 9 times, most recently from 8e8de9c to 4ea7f8e Compare November 7, 2025 00:28

amacaskill force-pushed the runai-streamer-tests-2 branch from 4ea7f8e to f23b98f Compare November 7, 2025 00:53

add runai model streamer e2e test for GCS

e0e1089

Signed-off-by: Alexis MacAskill <[email protected]>

amacaskill force-pushed the runai-streamer-tests-2 branch from f23b98f to e0e1089 Compare November 7, 2025 01:06

DarkLight1337 approved these changes Nov 7, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 7, 2025 03:02

DarkLight1337 merged commit a47d94f into vllm-project:main Nov 7, 2025
19 checks passed

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

Add runai model streamer e2e test for GCS (vllm-project#28079)

7c7f73f

Signed-off-by: Alexis MacAskill <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add runai model streamer e2e test for GCS #28079

Add runai model streamer e2e test for GCS #28079

Uh oh!

amacaskill commented Nov 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

amacaskill commented Nov 5, 2025

Uh oh!

Uh oh!

amacaskill commented Nov 5, 2025

Uh oh!

DarkLight1337 commented Nov 5, 2025

Uh oh!

amacaskill commented Nov 5, 2025 •

edited

Loading

Uh oh!

amacaskill commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add runai model streamer e2e test for GCS #28079

Add runai model streamer e2e test for GCS #28079

Uh oh!

Conversation

amacaskill commented Nov 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

amacaskill commented Nov 5, 2025

Uh oh!

Uh oh!

amacaskill commented Nov 5, 2025

Uh oh!

DarkLight1337 commented Nov 5, 2025

Uh oh!

amacaskill commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amacaskill commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amacaskill commented Nov 4, 2025 •

edited by github-actions bot

Loading

amacaskill commented Nov 5, 2025 •

edited

Loading

amacaskill commented Nov 7, 2025 •

edited

Loading