-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Fix][Structured Output] using vocab_size to construct matcher #14868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix][Structured Output] using vocab_size to construct matcher #14868
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
I don't have write access, but currently waiting for xgrammar to publish 0.1.16, so should simplify this codepath for us @robertgshaw2-redhat |
|
@aarnphm - can you explain this change? |
|
Yes, so the vocab size here should be the one infered from hf_text_config (which in this case the chanage in this PR). I'm waiting xgrammar 0.1.16 to be published such that it will support olmo and aria models where they have additional token_id that is not used in lm_head. reasoning for |
So do we need to add |
|
@robertgshaw2-redhat I updated xgrammar to 0.1.16 now, so let's see if this works |
|
@aarnphm is this good to go? |
Yes. we can wait for the tests to pass, then can merge this one. |
|
Ah I saw you are targeting Rob’s branch. Let me know when it is good to merge. |
|
Looks like it still failed |
russellb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this has a lot of extra formatting-only changes. That makes stuff harder to review. I'd really rather formatting changes be done separately.
cda25b9 to
46b427c
Compare
|
I have reverted the formatter change. |
46b427c to
c2e8471
Compare
54c3882 to
e55df8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this called on the hotpath or during initalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only setup during initialisation in get_local_xgrammar_guided_decoding_logits_processor
This has the same logic as xgr.Tokenizer.from_huggingface, sadly they only have decoded_vocab to byte string so we need to construct encoded_vocab here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_local_xgrammar_guided_decoding_logits_processor is called on the hotpath. It might be worth pre-computing these off the hotpath.
Any exploration of this should be done in a different PR
9d552b2 to
7b34e04
Compare
|
Tests are passed locally with both v0 and v1 for structured outputs, running on A100. |
Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
d248c19 to
223f335
Compare
Signed-off-by: Aaron Pham <[email protected]>
|
I was hitting an exception in V1 yesterday (without this PR): I haven't tried the latest VLLM yet, but I'm curious if this PR is supposed to fix exceptions related to out of bounds tokens (129207 was greater than vocab_size in config.json but smaller than lm_head size) . If not, I'll try to see how to reproduce it on latest and file an issue. |
|
This should fix that issue. |
…project#14868) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…project#14868) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
…project#14868) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Ok, well I ended up doing a few things in this PR:
Reasoning:
coreAPI, which vLLM was depending on in previous versionlm_headvocab_size (or the model's vocab_size retrieved from HF config)Part of #14832
Signed-off-by: Aaron Pham [email protected]
Co-authored-by: [email protected] [email protected]