ReasoningBomb

A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

Xiaogeng Liu^1* Xinyan Wang² Yechao Zhang³ Sanjay Kariyappa⁴ Chong Xiang⁴ Muhao Chen⁵ G. Edward Suh^4,6 Chaowei Xiao^1*

¹Johns Hopkins University ²University of Wisconsin–Madison ³Nanyang Technological University
⁴NVIDIA ⁵University of California, Davis ⁶Cornell University
* Corresponding authors

Paper Code Dataset BibTeX

Abstract

Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service (PI-DoS) attacks that exploit the high computational cost of reasoning. We first formalize inference cost for LRMs and define PI-DoS, then prove that any practical PI-DoS attack should satisfy three properties: (1) a high amplification ratio, where each query induces a disproportionately long reasoning trace relative to its own length; (2) stealthiness, in which prompts and responses remain on the natural language manifold and evade distribution shift detectors; and (3) optimizability, in which the attack supports efficient optimization without being slowed by its own success.

Under this framework, we present ReasoningBomb, a reinforcement-learning-based PI-DoS framework that trains a large reasoning-model attacker to generate short natural prompts that drive victim LRMs into pathologically long reasoning. ReasoningBomb uses a two-stage pipeline combining supervised fine-tuning under a strict token budget with GRPO-based reinforcement learning, guided by a constant-time surrogate reward computed from victim model hidden states via a lightweight MLP. Across seven open-source models and three commercial LRMs, ReasoningBomb induces an average of 18,759 completion tokens and 19,263 reasoning tokens, surpassing the runner-up baseline by 35% in completion tokens and 38% in reasoning tokens, while achieving 286.7× input-to-output amplification ratio and 98.4% bypass rate against dual-stage detection.

Why This Matters

The cumulative inference-time computational cost of serving LRMs has grown to rival or exceed one-time training costs at scale. For an input of L_in tokens, the provider's per-request cost grows as C_req ≈ κ(L_in + L_rp + L_out), where reasoning length L_rp can substantially exceed output length L_out. This cost structure interacts poorly with subscription-based pricing (e.g., ChatGPT Plus, SuperGrok), where users pay a fixed fee while providers bear variable costs.

Economic Asymmetry

OpenAI CEO Sam Altman noted that users being "extra polite and verbose with ChatGPT was costing OpenAI tens of millions of dollars" in additional inference cost.

Resource Exhaustion

PI-DoS reflects "Unbounded Consumption" in OWASP 2025 Top 10, where adversaries exploit adaptive computation to inflate per-request cost and degrade service for legitimate users.

Multi-Account Amplification

Adversaries can maintain multiple subscriptions in parallel, distributing adversarial prompts to circumvent per-account rate limits and force providers to process many high-cost requests simultaneously.

Threat Model

We consider adversaries with legitimate access to LRM services via subscription tiers (e.g., ChatGPT Plus, SuperGrok) that enforce message-rate quotas rather than per-token billing. These pricing schemes create a fundamental economic asymmetry: while users pay a fixed fee, providers bear variable costs C_req = κ(L_in + L_rp + L_out) that scale sharply with reasoning trace length. The adversary's goal is to craft prompts that drive per-request cost far beyond benign traffic by inducing pathologically long reasoning traces. To amplify impact, adversaries can maintain multiple subscriptions in parallel, distributing adversarial prompts to circumvent per-account rate limits while forcing providers to process many high-cost requests simultaneously.

Figure 1. Illustration of PI-DoS threat model. Adversaries craft malicious prompts that induce pathologically long reasoning traces compared to benign users. By launching attacks from multiple accounts, adversaries inflict disproportionate financial harm, exhaust computational resources, and degrade service quality.

Three Essential Properties of PI-DoS Attacks

We formally prove that these three properties are necessary for any practical PI-DoS attack. Through this analytical framework, we uncover fundamental limitations of existing methods: no existing work satisfies all three properties simultaneously.

Amplification

A practical PI-DoS attack must make each cheap, short query impose a disproportionately large computational burden. We prove that shorter prompts achieve higher amplification ratios under real serving policies, so short prompts that trigger extensive reasoning are optimal.

Stealthiness

Real-world deployments employ input filters, output monitors, and joint detectors. We prove that attacks using abnormal prompts or inducing abnormal outputs create large distributional shifts easily flagged by detectors. Prompts must remain on the natural-language manifold.

Optimizability

If the attacker uses victim's actual reasoning length as reward, evaluation time grows with attack success, creating a self-defeating feedback loop. Practical PI-DoS requires constant-time surrogate feedback that remains efficient as attacks improve.

Method	Amplification	Stealthiness	Optimizability
Manual Puzzles
GCG-DoS / Engorgio / Excessive
AutoDoS / CatAttack
ICL / POT / ThinkTrap
ReasoningBomb (Ours)

Method

ReasoningBomb employs a multi-component architecture trained via reinforcement learning. The attacker model generates PI-DoS attack prompts, a length predictor evaluates their expected effectiveness using constant-time feedback, and a diversity evaluator encourages exploration of varied attack strategies.

Attacker Model

A trainable LRM generates attack prompts through reasoning. The model outputs in the format <think> meta-reasoning </think> adversarial puzzle, where meta-reasoning allows explicit deliberation about attack strategies before producing the final prompt.

Length Predictor

Provides constant-time reward signals by estimating the expected reasoning trace length a prompt would induce. Rather than expensive autoregressive generation, we extract hidden states from a frozen victim model via a single forward pass, then apply a lightweight MLP to predict the length.

Diversity Evaluator

Encourages the attacker to explore varied attack strategies rather than collapsing to repetitive prompts. For each group of prompts, we compute text embeddings and measure pairwise cosine similarities. Prompts dissimilar to others receive higher diversity rewards.

Two-Stage Training Pipeline

Stage 1 (SFT): Supervised fine-tuning on adversarial prompts satisfying token budget L_in ≤ L_budget, teaching reliable generation of valid short attack prompts. Stage 2 (RL): GRPO-based reinforcement learning where the attacker maximizes the constant-time surrogate reward combining length prediction and diversity components.

RQ1: Attack Effectiveness

We evaluate ReasoningBomb on 10 victim models: 2 LLMs (DeepSeek-V3, Kimi-K2), 5 open-source LRMs (DeepSeek-R1, MiniMax-M2, Nemotron-3, Qwen3-30B, Qwen3-32B), and 3 commercial LRMs (Claude 4.5, GPT-5, Gemini 3).

Completion & Reasoning Tokens

ReasoningBomb (256-token budget) induces 18,759 completion tokens on average across all 10 models (LLMs + LRMs), and 19,263 reasoning tokens on average across LRMs, surpassing the runner-up by 35% in completion tokens and 38% in reasoning tokens.

Bar chart showing reasoning token distribution across all 10 victim models, with ReasoningBomb consistently generating 6-7x more tokens than benign queries

Figure 2. Reasoning token distribution across all victim models. ReasoningBomb induces 6–7× more reasoning tokens than benign queries (SimpleQA: 1,222 tokens, SimpleBench: 4,276 tokens) and consistently outperforms all baselines.

Input-to-Output Amplification

Using only 60 input tokens on average, ReasoningBomb (128-token budget) induces 17,096 completion tokens, achieving 286.69× amplification. Per-model average reaches 301.98× (ranging from 39× to 640×), which is 197% better than the runner-up.

Scatter plot showing input token count vs output token count, demonstrating high amplification ratios for short prompts

Figure 3. Amplification analysis: ReasoningBomb (green stars) achieves long outputs with short prompts. AutoDoS uses 5,215 input tokens for only 2.22× amplification; ICL uses 1,355 tokens for 5.85×, which are 129× and 49× worse than ours, respectively.

RQ2: Constant-Time Surrogate Reward

A practical PI-DoS training signal must be both informative (correlates with victim reasoning length) and constant-time (cost does not grow with attack success). Our surrogate reward replaces expensive autoregressive generation with a single forward pass.

Surrogate Predictor Validation

Our length predictor achieves strong correlation on the training victim (DeepSeek-R1-32B: Pearson r=0.7485) and transfers meaningfully to different architectures (Qwen3-32B: r=0.5563), with 71–79% pairwise direction accuracy.

Correlation plot validating the surrogate predictor accuracy against actual reasoning lengths

Figure 4. Correlation between surrogate predictor and actual victim generation length. Training split: Pearson r=0.7485, direction accuracy 79.4%; validation split: r=0.5635, accuracy 70.9%; transfer to Qwen3-32B: r=0.5563, accuracy 70.8%.

Speedup over Direct Feedback

Direct victim feedback grows from 23.31s to 191.26s per query as attacks strengthen. Our surrogate remains constant at 0.19–0.22ms, yielding 4.39×10⁵× cumulative speedup.

Bar chart comparing surrogate reward time (~0.2ms) vs direct victim feedback time (23-191 seconds)

Figure 5. Response time comparison. Direct victim feedback: 23.31s→191.26s per query (601.6s total over 7 queries). Our surrogate: 0.19–0.22ms per query (1.37ms total), enabling practical RL optimization.

RQ3: Stealthiness Against Defenses

We evaluate against a strict dual-stage defense combining (1) input-side perplexity filter + LLM-as-judge, and (2) output-side LLM-as-judge on the first 2,000 tokens. ReasoningBomb achieves a 98.4% bypass rate under joint detection.

Method	Input Det.	Output Det.	Dual-Stage
AutoDoS	100.0%	13.3%	100.0%
Engorgio	100.0%	0.0%	100.0%
CatAttack	2.7%	0.0%	2.7%
ICL	10.0%	2.0%	10.0%
ReasoningBomb (Ours)	0.2%	1.3%	1.6%

Detection rates on Qwen3-32B (lower is better for attacker). Dual-stage combines input and output detection via OR-gate.

RQ4: Real-World Impact Simulation

We simulate a production inference server (8×NVIDIA A100 node, FCFS queue) to measure how ReasoningBomb attacks impact benign user throughput under mixed traffic.

Throughput Degradation & Compute Monopolization

At just 10% malicious traffic with 32K response cap: benign throughput drops from 8.52 to 4.28 req/min (−49.8%), while attackers monopolize 64.3% of total compute time. Even 1% malicious requests occupy 20.7% of compute.

Line charts showing (a) benign throughput degradation and (b) compute time occupation as malicious traffic increases

Figure 6. Real-world server simulation. (a) Benign User Processed (BUP) decreases as malicious ratio increases. (b) Computational Time Occupation (CTO) shows attackers dominate server resources. At 32K cap with 10% malicious traffic: −49.8% throughput, 64.3% compute monopolized.

Potential Defenses

We discuss several promising directions for defending against PI-DoS attacks on LRMs:

KV Cache Reusing

Cache embeddings of known attack prompts to detect near-duplicates and skip expensive prefilling computation.

Internal Red-Teaming

Proactively discover attack prompts using the ReasoningBomb framework itself, then use them for adversarial fine-tuning.

Response Caching

Pre-compute and cache solutions for known attack patterns to eliminate redundant reasoning computation.

Adversarial Training

Fine-tune models to produce shorter reasoning traces when processing attack-like prompts without degrading benign performance.

Citation

If you find our work useful, please cite our paper:

@misc{liu2026reasoningbombstealthydenialofserviceattack,
  title={ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing
         Pathologically Long Reasoning in Large Reasoning Models},
  author={Xiaogeng Liu and Xinyan Wang and Yechao Zhang and Sanjay Kariyappa
          and Chong Xiang and Muhao Chen and G. Edward Suh and Chaowei Xiao},
  year={2026},
  eprint={2602.00154},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2602.00154},
}