[Fix][Structured Output] using vocab_size to construct matcher #14868

aarnphm · 2025-03-15T17:31:26Z

Ok, well I ended up doing a few things in this PR:

upgrade xgrammar to 0.1.16 to add support for vision models in v1.
moving/simplify a lot of the metadata logics from our side and use xgrammar API for this construction

Reasoning:

xgrammar updated some core API, which vLLM was depending on in previous version
0.1.16 simplify our fixes for v1 vocab_size, which this PR restores back to our previous implementation of relying on lm_head vocab_size (or the model's vocab_size retrieved from HF config)

Part of #14832

Signed-off-by: Aaron Pham [email protected]
Co-authored-by: [email protected] [email protected]

github-actions · 2025-03-15T17:31:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

aarnphm · 2025-03-15T17:32:09Z

I don't have write access, but currently waiting for xgrammar to publish 0.1.16, so should simplify this codepath for us @robertgshaw2-redhat

robertgshaw2-redhat · 2025-03-15T17:36:52Z

@aarnphm - can you explain this change?

aarnphm · 2025-03-15T17:42:33Z

Yes, so the vocab size here should be the one infered from hf_text_config (which in this case the chanage in this PR).

I'm waiting xgrammar 0.1.16 to be published such that it will support olmo and aria models where they have additional token_id that is not used in lm_head.

reasoning for get_vocab_size() is to accomodate vision models (which will now be obsolete with 0.1.16)

robertgshaw2-redhat · 2025-03-15T18:30:24Z

Yes, so the vocab size here should be the one infered from hf_text_config (which in this case the chanage in this PR).

I'm waiting xgrammar 0.1.16 to be published such that it will support olmo and aria models where they have additional token_id that is not used in lm_head.

reasoning for get_vocab_size() is to accomodate vision models (which will now be obsolete with 0.1.16)

So do we need to add SupportsV0only to Olmo and Aria?

aarnphm · 2025-03-16T01:26:09Z

@robertgshaw2-redhat I updated xgrammar to 0.1.16 now, so let's see if this works

simon-mo · 2025-03-16T03:49:44Z

@aarnphm is this good to go?

aarnphm · 2025-03-16T04:12:15Z

@aarnphm is this good to go?

Yes. we can wait for the tests to pass, then can merge this one.

simon-mo · 2025-03-16T04:22:21Z

Ah I saw you are targeting Rob’s branch. Let me know when it is good to merge.

simon-mo · 2025-03-16T05:10:30Z

Looks like it still failed

russellb

It seems like this has a lot of extra formatting-only changes. That makes stuff harder to review. I'd really rather formatting changes be done separately.

aarnphm · 2025-03-16T18:07:09Z

I have reverted the formatter change.

tests/model_executor/test_guided_processors.py

vllm/model_executor/guided_decoding/xgrammar_decoding.py

robertgshaw2-redhat · 2025-03-17T01:52:25Z

vllm/model_executor/guided_decoding/xgrammar_decoding.py

Is this called on the hotpath or during initalization?

We only setup during initialisation in get_local_xgrammar_guided_decoding_logits_processor

This has the same logic as xgr.Tokenizer.from_huggingface, sadly they only have decoded_vocab to byte string so we need to construct encoded_vocab here...

get_local_xgrammar_guided_decoding_logits_processor is called on the hotpath. It might be worth pre-computing these off the hotpath.

Any exploration of this should be done in a different PR

robertgshaw2-redhat · 2025-03-17T02:17:07Z

cc @simon-mo - this is a Release Blocker.

The PR looks good to me pending the question about MistralTokenizer, but I am not an expert on structured generation so I will let @russellb give the final say

vllm/model_executor/guided_decoding/xgrammar_decoding.py

aarnphm · 2025-03-17T06:34:02Z

Tests are passed locally with both v0 and v1 for structured outputs, running on A100.

Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]>

Signed-off-by: Aaron Pham <[email protected]>

pathorn · 2025-03-18T04:28:08Z

I was hitting an exception in V1 yesterday (without this PR):

INFO 03-15 09:08:29 [logger.py:39] Received request cmpl-80d6fb517cc646a6993d0921c9ce295f-0: prompt: '', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=-1, min_p=0.0, seed=1, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8192, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=True, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: [], lora_request: None, prompt_adapter_request: None.

EngineCore hit an exception: Traceback (most recent call last): 
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 330, in run_engine_core
    engine_core.run_busy_loop()
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 364, in run_busy_loop
    outputs = step_fn()
              ^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 193, in step
    engine_core_outputs = self.scheduler.update_from_output( 
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/scheduler.py", line 621, in update_from_output
    request.structured_output_request.grammar.accept_tokens(  # type: ignore[union-attr]
  File "/opt/venv/lib/python3.12/site-packages/vllm/v1/structured_output/grammar.py", line 57, in accept_tokens
    if not self.matcher.accept_token(token):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   
  File "/opt/venv/lib/python3.12/site-packages/xgrammar/matcher.py", line 220, in accept_token
    return self._handle.accept_token(token_id, debug_print)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [09:09:06] /project/cpp/grammar_matcher.cc:362: Check failed: (token_id >= 0 && token_id < tokenizer_info_.GetVocabSize()) is false: Invalid token id 129207 for GrammarMatcher

I haven't tried the latest VLLM yet, but I'm curious if this PR is supposed to fix exceptions related to out of bounds tokens (129207 was greater than vocab_size in config.json but smaller than lm_head size) . If not, I'll try to see how to reproduce it on latest and file an issue.

aarnphm · 2025-03-18T05:10:20Z

This should fix that issue.

…project#14868) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…project#14868) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

…project#14868) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Mu Huai <[email protected]>

aarnphm requested review from mgoin and russellb as code owners March 15, 2025 17:31

aarnphm requested a review from robertgshaw2-redhat March 15, 2025 17:31

mergify bot added the v1 label Mar 15, 2025

aarnphm changed the title ~~revert: move back to use vocab_size until 0.1.16 is released~~ [Fix][V1][Structured Output] move back to use vocab_size from config Mar 15, 2025

simon-mo added this to the v0.8.0 milestone Mar 15, 2025

mergify bot added the ci/build label Mar 16, 2025

mergify bot added the structured-output label Mar 16, 2025

simon-mo added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2025

aarnphm changed the title ~~[Fix][V1][Structured Output] move back to use vocab_size from config~~ [Fix][Structured Output] move back to use vocab_size from config Mar 16, 2025

aarnphm changed the title ~~[Fix][Structured Output] move back to use vocab_size from config~~ [Fix][Structured Output] using vocab_size to construct matcher and upgrade to 0.1.16 Mar 16, 2025

aarnphm changed the title ~~[Fix][Structured Output] using vocab_size to construct matcher and upgrade to 0.1.16~~ [Fix][Structured Output] using vocab_size to construct matcher Mar 16, 2025

This was referenced Mar 16, 2025

[Core][V0] Enable regex support with xgrammar #13228

Merged

XGRAMMAR now support aarch64 #13894

Closed

Don't try to add special tokens to the matcher in XGrammar. #11060

Closed

russellb requested changes Mar 16, 2025

View reviewed changes

aarnphm force-pushed the fix/v1-structured-output branch from cda25b9 to 46b427c Compare March 16, 2025 18:06

aarnphm requested a review from russellb March 16, 2025 18:06

aarnphm force-pushed the fix/v1-structured-output branch from 46b427c to c2e8471 Compare March 16, 2025 18:08

aarnphm force-pushed the fix/v1-structured-output branch from 54c3882 to e55df8e Compare March 17, 2025 00:28

russellb mentioned this pull request Mar 17, 2025

[V1] Get vocab size from model config #14904

Closed

robertgshaw2-redhat reviewed Mar 17, 2025

View reviewed changes

tests/model_executor/test_guided_processors.py Show resolved Hide resolved

robertgshaw2-redhat reviewed Mar 17, 2025

View reviewed changes

vllm/model_executor/guided_decoding/xgrammar_decoding.py Show resolved Hide resolved

robertgshaw2-redhat reviewed Mar 17, 2025

View reviewed changes

vllm/model_executor/guided_decoding/xgrammar_decoding.py Outdated Show resolved Hide resolved

aarnphm force-pushed the fix/v1-structured-output branch 3 times, most recently from 9d552b2 to 7b34e04 Compare March 17, 2025 04:58

[email protected] and others added 3 commits March 17, 2025 06:50

fix(mistral): make it consistent with v1

5abe5fa

Signed-off-by: Aaron Pham <[email protected]>

fix: correct tests for pickled round trip

223f335

Signed-off-by: Aaron Pham <[email protected]>

aarnphm force-pushed the fix/v1-structured-output branch from d248c19 to 223f335 Compare March 17, 2025 06:50

revert: tests change

48d3d54

Signed-off-by: Aaron Pham <[email protected]>

russellb approved these changes Mar 17, 2025

View reviewed changes

aarnphm removed the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2025

robertgshaw2-redhat merged commit c0efdd6 into vllm-project:main Mar 17, 2025
57 checks passed

aarnphm deleted the fix/v1-structured-output branch March 17, 2025 15:42

yankay mentioned this pull request Apr 2, 2025

[CI] Remove duplicate entrypoints-test #15940

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Uh oh!

[Fix][Structured Output] using vocab_size to construct matcher #14868

[Fix][Structured Output] using vocab_size to construct matcher #14868

Uh oh!

Conversation

aarnphm commented Mar 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

aarnphm commented Mar 15, 2025

Uh oh!

robertgshaw2-redhat commented Mar 15, 2025

Uh oh!

aarnphm commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 15, 2025

Uh oh!

aarnphm commented Mar 16, 2025

Uh oh!

simon-mo commented Mar 16, 2025

Uh oh!

aarnphm commented Mar 16, 2025

Uh oh!

simon-mo commented Mar 16, 2025

Uh oh!

simon-mo commented Mar 16, 2025

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

aarnphm commented Mar 16, 2025

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

aarnphm Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aarnphm commented Mar 17, 2025

Uh oh!

Uh oh!

pathorn commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarnphm commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aarnphm commented Mar 15, 2025 •

edited by github-actions bot

Loading

aarnphm commented Mar 15, 2025 •

edited

Loading

aarnphm Mar 17, 2025 •

edited

Loading

robertgshaw2-redhat Mar 17, 2025 •

edited

Loading

robertgshaw2-redhat commented Mar 17, 2025 •

edited

Loading

pathorn commented Mar 18, 2025 •

edited

Loading