Skip to content

Conversation

@amacaskill
Copy link
Contributor

@amacaskill amacaskill commented Nov 4, 2025

Purpose

The purpose of this is to add a basic RunAI model streamer e2e test that pulls from a public GCS bucket, and add this to the CI pipeline, to prevent regressions in the RunAI model streamer. Examples of past RunAI model streamer regressions that this e2e test would have caught is:

We also need code coverage for vllm/config/model.py

This test uses a small model, codegemma-2b which is around 5.6 GiB.

Test Plan

We plan to enable all of the RunAI model streamer tests in the CI pipeline. Particularly, we want to run the tests on changes to any of the following:

  source_file_dependencies:
  - vllm/engine
  - vllm/model_executor/model_loader
  - tests/model_executor/model_loader/runai_model_streamer

We also want to run the tests on the torch_nightly.

We didn't add this to Model Executor Test because we would like to run the RunAI model streamer tests on changes to vllm/engine , tests/model_executor/runai_model_streamer, and on the nightly build (which Model Executor Test doesn't currently do).

Test Result

# All runai_model_streamer tests pass (including new test I added)
pytest -s -v tests/model_executor/model_loader/runai_model_streamer
=========================================== 7 passed, 3 warnings in 76.94s (0:01:16) ===========================================

# The new test I added passes: 
pytest -s -v tests/model_executor/model_loader/runai_model_streamer/test_runai_model_streamer_loader.py::test_runai_model_loader_download_gcs_files
PASSED
======================================================= warnings summary =======================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /root/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================ 1 passed, 3 warnings in 47.99s ================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
(vllm) root@vllm-g2-vm:~/vllm# 

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@amacaskill amacaskill force-pushed the runai-streamer-tests-2 branch from b2996e3 to 9f5e8a1 Compare November 4, 2025 23:42
@mergify mergify bot added the ci/build label Nov 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an end-to-end test for the RunAI model streamer using a model from a public GCS bucket. The change is straightforward and helps prevent regressions. I've suggested strengthening the test's assertion to make it more robust in catching potential issues.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@amacaskill amacaskill force-pushed the runai-streamer-tests-2 branch 6 times, most recently from 5f499c9 to 89c9529 Compare November 5, 2025 00:06
@amacaskill
Copy link
Contributor Author

Hi @22quinn @rahul-tuli @DarkLight1337 Would one of you be able to review this PR?

@amacaskill amacaskill force-pushed the runai-streamer-tests-2 branch from 89c9529 to a55cd5c Compare November 5, 2025 17:00
@amacaskill
Copy link
Contributor Author

@DarkLight1337 I have another question. Say there is a refactor of the RunAI model streamer, so now logic that could break the streamer, is in another file (file X), and file X isn't listed in source_file_dependencies. If someone breaks the RunAI model streamer logic in file X, then the runai model streamer test will not be run on the PR presubmit, and it will run in the torch nightly build for the first time, and fail.

When a test fails in torch nightly, what happens? Who is responsible for fixing that failure? Does the failure block the nightly build and/or the next vllm release? Or does vllm just proceed with the release/build, with that feature broken?

@DarkLight1337
Copy link
Member

When a test fails in torch nightly, what happens? Who is responsible for fixing that failure? Does the failure block the nightly build and/or the next vllm release? Or does vllm just proceed with the release/build, with that feature broken?

Nightly failures are not blocking, but we try to fix as many as possible before releasing.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 5, 2025 17:32
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025
@amacaskill
Copy link
Contributor Author

amacaskill commented Nov 5, 2025

New test test_runai_model_loader_download_files_gcs, fails in CI because google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information. This didn't happen for me locally, but probably because I have ADC by default on my GCE VM. To revoke the ADC for my GCE VM, I recreated my VM with no-service-account so it doesn't use the GCE Compute engine default SA (which plugs into ADC). This happens within _create_client() in the RunAI model streamer repo.

To fix this, I tried adding monkeypatch.setenv("RUNAI_STREAMER_GCS_USE_ANONYMOUS_CREDENTIALS", "true") to the test, but this failed because in storage.Client() , still requires a project and other things, even when it is passed an anonymous credentials. To fix this, I think I need to change _create_client(), to return _create_anonymous_client() if credentials.credential_type == CredentialType.ANONYMOUS_CREDENTIALS

def _create_client() -> storage.client.Client:
    credentials = get_credentials()
    if credentials.credential_type == CredentialType.ANONYMOUS_CREDENTIALS:
        return storage.Client.create_anonymous_client()
    return storage.Client(credentials=credentials.gcp_credentials())

Tested this out locally, but this fails in weird place:

(EngineCore_DP0 pid=16176)   File "/root/vllm/vllm/model_executor/model_loader/weight_utils.py", line 663, in runai_safetensors_weights_iterator
(EngineCore_DP0 pid=16176)     streamer.stream_files(hf_weights_files)
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/safetensors_streamer/safetensors_streamer.py", line 78, in stream_files
(EngineCore_DP0 pid=16176)     safetensors_metadatas = safetensors_pytorch.prepare_request(self.file_streamer, paths, s3_credentials)
(EngineCore_DP0 pid=16176)                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 105, in prepare_request
(EngineCore_DP0 pid=16176)     safetensors_metadatas = SafetensorsMetadata.from_files(fs, paths, s3_credentials)
(EngineCore_DP0 pid=16176)                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 58, in from_files
(EngineCore_DP0 pid=16176)     for file_index, ready_chunk_index, buffer in fs.get_chunks():
(EngineCore_DP0 pid=16176)                                                  ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/distributed_streamer/distributed_streamer.py", line 141, in get_chunks
(EngineCore_DP0 pid=16176)     for item in self.file_streamer.get_chunks():
(EngineCore_DP0 pid=16176)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/file_streamer/file_streamer.py", line 121, in get_chunks
(EngineCore_DP0 pid=16176)     yield from self.request_ready_chunks()
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/file_streamer/file_streamer.py", line 142, in request_ready_chunks
(EngineCore_DP0 pid=16176)     file_relative_index, chunk_relative_index = runai_response(self.streamer)
(EngineCore_DP0 pid=16176)                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16176)   File "/root/vllm/.venv/lib/python3.12/site-packages/runai_model_streamer/libstreamer/libstreamer.py", line 93, in runai_response
(EngineCore_DP0 pid=16176)     raise Exception(
(EngineCore_DP0 pid=16176) Exception: Could not receive runai_response from libstreamer due to: b'File access error'

Maybe this is because the C++ code doesn't support the anonymous GCS access? I see there is a TODO. Also tried setting up a fake RUNAI_STREAMER_GCS_CREDENTIAL_FILE, but the google auth library was too smart and it failed earlier. I think I either need to (1) fix the C++ code to support anonymous GCS access or (2) Change the test to setup the ADC credentials and set GOOGLE_APPLICATION_CREDENTIALS. Leaning toward (2)

auto-merge was automatically disabled November 5, 2025 19:32

Head branch was pushed to by a user without write access

@amacaskill amacaskill force-pushed the runai-streamer-tests-2 branch 9 times, most recently from 8e8de9c to 4ea7f8e Compare November 7, 2025 00:28
@amacaskill amacaskill force-pushed the runai-streamer-tests-2 branch from 4ea7f8e to f23b98f Compare November 7, 2025 00:53
@amacaskill amacaskill force-pushed the runai-streamer-tests-2 branch from f23b98f to e0e1089 Compare November 7, 2025 01:06
@amacaskill
Copy link
Contributor Author

amacaskill commented Nov 7, 2025

Maybe this is because the C++ code doesn't support the anonymous GCS access? I see there is a TODO. Also tried setting up a fake RUNAI_STREAMER_GCS_CREDENTIAL_FILE, but the google auth library was too smart and it failed earlier. I think I either need to (1) fix the C++ code to support anonymous GCS access or (2) Change the test to setup the ADC credentials and set GOOGLE_APPLICATION_CREDENTIALS. Leaning toward (2)

After much trial and error, I fixed the test to correctly use the anonymous credentials, and now the test is passing

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 7, 2025 03:02
@DarkLight1337 DarkLight1337 merged commit a47d94f into vllm-project:main Nov 7, 2025
19 checks passed
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants