Skip to content

[Bug]: [V1] Molmo/Aria not supported on V1 due to xgrammar #14534

@robertgshaw2-redhat

Description

@robertgshaw2-redhat

Your current environment

Cannot use these models on V1 due to Xgrammar assert

🐛 Describe the bug

  • run the following
VLLM_USE_V1=1 pytest -s -x models/decoder_only/vision_language/test_models.py -k molmo
VLLM_USE_V1=1 pytest -s -x models/decoder_only/vision_language/test_models.py -k aria
  • get the following back
ERROR 03-10 03:06:35 [core.py:324] EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-10 03:06:35 [core.py:324]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 316, in run_engine_core
ERROR 03-10 03:06:35 [core.py:324]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 03-10 03:06:35 [core.py:324]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-10 03:06:35 [core.py:324]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 271, in __init__
ERROR 03-10 03:06:35 [core.py:324]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 03-10 03:06:35 [core.py:324]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 65, in __init__
ERROR 03-10 03:06:35 [core.py:324]     self.structured_output_manager = StructuredOutputManager(vllm_config)
ERROR 03-10 03:06:35 [core.py:324]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-10 03:06:35 [core.py:324]   File "/home/rshaw/vllm/vllm/v1/structured_output/__init__.py", line 44, in __init__
ERROR 03-10 03:06:35 [core.py:324]     tokenizer_info = xgr.TokenizerInfo.from_huggingface(
ERROR 03-10 03:06:35 [core.py:324]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-10 03:06:35 [core.py:324]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/xgrammar/tokenizer_info.py", line 188, in from_huggingface
ERROR 03-10 03:06:35 [core.py:324]     raise ValueError(msg)
ERROR 03-10 03:06:35 [core.py:324] ValueError: Input vocab_size less than minimum viable vocab size for tokenizer <class 'vllm.transformers_utils.tokenizer.get_cached_tokenizer.<locals>.CachedTokenizer'>.

Seems to be due to the relative sizes of the tokenizer vocab and model vocab

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions