Skip to content

Conversation

@aarnphm
Copy link
Collaborator

@aarnphm aarnphm commented Mar 15, 2025

Ok, well I ended up doing a few things in this PR:

  • upgrade xgrammar to 0.1.16 to add support for vision models in v1.
  • moving/simplify a lot of the metadata logics from our side and use xgrammar API for this construction

Reasoning:

  • xgrammar updated some core API, which vLLM was depending on in previous version
  • 0.1.16 simplify our fixes for v1 vocab_size, which this PR restores back to our previous implementation of relying on lm_head vocab_size (or the model's vocab_size retrieved from HF config)

Part of #14832

Signed-off-by: Aaron Pham [email protected]
Co-authored-by: [email protected] [email protected]

@aarnphm aarnphm requested review from mgoin and russellb as code owners March 15, 2025 17:31
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Mar 15, 2025
@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 15, 2025

I don't have write access, but currently waiting for xgrammar to publish 0.1.16, so should simplify this codepath for us @robertgshaw2-redhat

@robertgshaw2-redhat
Copy link
Collaborator

@aarnphm - can you explain this change?

@aarnphm aarnphm changed the title revert: move back to use vocab_size until 0.1.16 is released [Fix][V1][Structured Output] move back to use vocab_size from config Mar 15, 2025
@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 15, 2025

Yes, so the vocab size here should be the one infered from hf_text_config (which in this case the chanage in this PR).

I'm waiting xgrammar 0.1.16 to be published such that it will support olmo and aria models where they have additional token_id that is not used in lm_head.

reasoning for get_vocab_size() is to accomodate vision models (which will now be obsolete with 0.1.16)

@robertgshaw2-redhat
Copy link
Collaborator

Yes, so the vocab size here should be the one infered from hf_text_config (which in this case the chanage in this PR).

I'm waiting xgrammar 0.1.16 to be published such that it will support olmo and aria models where they have additional token_id that is not used in lm_head.

reasoning for get_vocab_size() is to accomodate vision models (which will now be obsolete with 0.1.16)

So do we need to add SupportsV0only to Olmo and Aria?

@simon-mo simon-mo added this to the v0.8.0 milestone Mar 15, 2025
@mergify mergify bot added the ci/build label Mar 16, 2025
@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 16, 2025

@robertgshaw2-redhat I updated xgrammar to 0.1.16 now, so let's see if this works

@simon-mo simon-mo added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2025
@simon-mo
Copy link
Collaborator

@aarnphm is this good to go?

@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 16, 2025

@aarnphm is this good to go?

Yes. we can wait for the tests to pass, then can merge this one.

@simon-mo
Copy link
Collaborator

Ah I saw you are targeting Rob’s branch. Let me know when it is good to merge.

@simon-mo
Copy link
Collaborator

Looks like it still failed

@aarnphm aarnphm changed the title [Fix][V1][Structured Output] move back to use vocab_size from config [Fix][Structured Output] move back to use vocab_size from config Mar 16, 2025
@aarnphm aarnphm changed the title [Fix][Structured Output] move back to use vocab_size from config [Fix][Structured Output] using vocab_size to construct matcher and upgrade to 0.1.16 Mar 16, 2025
@aarnphm aarnphm changed the title [Fix][Structured Output] using vocab_size to construct matcher and upgrade to 0.1.16 [Fix][Structured Output] using vocab_size to construct matcher Mar 16, 2025
Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this has a lot of extra formatting-only changes. That makes stuff harder to review. I'd really rather formatting changes be done separately.

@aarnphm aarnphm force-pushed the fix/v1-structured-output branch from cda25b9 to 46b427c Compare March 16, 2025 18:06
@aarnphm aarnphm requested a review from russellb March 16, 2025 18:06
@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 16, 2025

I have reverted the formatter change.

@aarnphm aarnphm force-pushed the fix/v1-structured-output branch from 46b427c to c2e8471 Compare March 16, 2025 18:08
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called on the hotpath or during initalization?

Copy link
Collaborator Author

@aarnphm aarnphm Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only setup during initialisation in get_local_xgrammar_guided_decoding_logits_processor

This has the same logic as xgr.Tokenizer.from_huggingface, sadly they only have decoded_vocab to byte string so we need to construct encoded_vocab here...

Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_local_xgrammar_guided_decoding_logits_processor is called on the hotpath. It might be worth pre-computing these off the hotpath.

Any exploration of this should be done in a different PR

@robertgshaw2-redhat
Copy link
Collaborator

robertgshaw2-redhat commented Mar 17, 2025

cc @simon-mo - this is a Release Blocker.

The PR looks good to me pending the question about MistralTokenizer, but I am not an expert on structured generation so I will let @russellb give the final say

@aarnphm aarnphm force-pushed the fix/v1-structured-output branch 3 times, most recently from 9d552b2 to 7b34e04 Compare March 17, 2025 04:58
@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 17, 2025

Tests are passed locally with both v0 and v1 for structured outputs, running on A100.

[email protected] and others added 3 commits March 17, 2025 06:50
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
@aarnphm aarnphm force-pushed the fix/v1-structured-output branch from d248c19 to 223f335 Compare March 17, 2025 06:50
Signed-off-by: Aaron Pham <[email protected]>
@aarnphm aarnphm removed the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2025
@robertgshaw2-redhat robertgshaw2-redhat merged commit c0efdd6 into vllm-project:main Mar 17, 2025
57 checks passed
@aarnphm aarnphm deleted the fix/v1-structured-output branch March 17, 2025 15:42
@pathorn
Copy link
Contributor

pathorn commented Mar 18, 2025

I was hitting an exception in V1 yesterday (without this PR):

INFO 03-15 09:08:29 [logger.py:39] Received request cmpl-80d6fb517cc646a6993d0921c9ce295f-0: prompt: '', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=-1, min_p=0.0, seed=1, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8192, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=True, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: [], lora_request: None, prompt_adapter_request: None.

EngineCore hit an exception: Traceback (most recent call last): 
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 330, in run_engine_core
    engine_core.run_busy_loop()
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 364, in run_busy_loop
    outputs = step_fn()
              ^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 193, in step
    engine_core_outputs = self.scheduler.update_from_output( 
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/scheduler.py", line 621, in update_from_output
    request.structured_output_request.grammar.accept_tokens(  # type: ignore[union-attr]
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/structured_output/grammar.py", line 57, in accept_tokens
    if not self.matcher.accept_token(token):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   
  File "/opt/venv/lib/python3.12/site-packages/xgrammar/matcher.py", line 220, in accept_token
    return self._handle.accept_token(token_id, debug_print)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [09:09:06] /project/cpp/grammar_matcher.cc:362: Check failed: (token_id >= 0 && token_id < tokenizer_info_.GetVocabSize()) is false: Invalid token id 129207 for GrammarMatcher 

I haven't tried the latest VLLM yet, but I'm curious if this PR is supposed to fix exceptions related to out of bounds tokens (129207 was greater than vocab_size in config.json but smaller than lm_head size) . If not, I'll try to see how to reproduce it on latest and file an issue.

@aarnphm
Copy link
Collaborator Author

aarnphm commented Mar 18, 2025

This should fix that issue.

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…project#14868)

Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…project#14868)

Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…project#14868)

Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants