A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models
Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service (PI-DoS) attacks that exploit the high computational cost of reasoning. We first formalize inference cost for LRMs and define PI-DoS, then prove that any practical PI-DoS attack should satisfy three properties: (1) a high amplification ratio, where each query induces a disproportionately long reasoning trace relative to its own length; (2) stealthiness, in which prompts and responses remain on the natural language manifold and evade distribution shift detectors; and (3) optimizability, in which the attack supports efficient optimization without being slowed by its own success.
Under this framework, we present ReasoningBomb, a reinforcement-learning-based PI-DoS framework that trains a large reasoning-model attacker to generate short natural prompts that drive victim LRMs into pathologically long reasoning. ReasoningBomb uses a two-stage pipeline combining supervised fine-tuning under a strict token budget with GRPO-based reinforcement learning, guided by a constant-time surrogate reward computed from victim model hidden states via a lightweight MLP. Across seven open-source models and three commercial LRMs, ReasoningBomb induces an average of 18,759 completion tokens and 19,263 reasoning tokens, surpassing the runner-up baseline by 35% in completion tokens and 38% in reasoning tokens, while achieving 286.7× input-to-output amplification ratio and 98.4% bypass rate against dual-stage detection.
The cumulative inference-time computational cost of serving LRMs has grown to rival or exceed one-time training costs at scale. For an input of Lin tokens, the provider's per-request cost grows as Creq ≈ κ(Lin + Lrp + Lout), where reasoning length Lrp can substantially exceed output length Lout. This cost structure interacts poorly with subscription-based pricing (e.g., ChatGPT Plus, SuperGrok), where users pay a fixed fee while providers bear variable costs.
OpenAI CEO Sam Altman noted that users being "extra polite and verbose with ChatGPT was costing OpenAI tens of millions of dollars" in additional inference cost.
PI-DoS reflects "Unbounded Consumption" in OWASP 2025 Top 10, where adversaries exploit adaptive computation to inflate per-request cost and degrade service for legitimate users.
Adversaries can maintain multiple subscriptions in parallel, distributing adversarial prompts to circumvent per-account rate limits and force providers to process many high-cost requests simultaneously.
We consider adversaries with legitimate access to LRM services via subscription tiers (e.g., ChatGPT Plus, SuperGrok) that enforce message-rate quotas rather than per-token billing. These pricing schemes create a fundamental economic asymmetry: while users pay a fixed fee, providers bear variable costs Creq = κ(Lin + Lrp + Lout) that scale sharply with reasoning trace length. The adversary's goal is to craft prompts that drive per-request cost far beyond benign traffic by inducing pathologically long reasoning traces. To amplify impact, adversaries can maintain multiple subscriptions in parallel, distributing adversarial prompts to circumvent per-account rate limits while forcing providers to process many high-cost requests simultaneously.
Figure 1. Illustration of PI-DoS threat model. Adversaries craft malicious prompts that induce pathologically long reasoning traces compared to benign users. By launching attacks from multiple accounts, adversaries inflict disproportionate financial harm, exhaust computational resources, and degrade service quality.
We formally prove that these three properties are necessary for any practical PI-DoS attack. Through this analytical framework, we uncover fundamental limitations of existing methods: no existing work satisfies all three properties simultaneously.
A practical PI-DoS attack must make each cheap, short query impose a disproportionately large computational burden. We prove that shorter prompts achieve higher amplification ratios under real serving policies, so short prompts that trigger extensive reasoning are optimal.
Real-world deployments employ input filters, output monitors, and joint detectors. We prove that attacks using abnormal prompts or inducing abnormal outputs create large distributional shifts easily flagged by detectors. Prompts must remain on the natural-language manifold.
If the attacker uses victim's actual reasoning length as reward, evaluation time grows with attack success, creating a self-defeating feedback loop. Practical PI-DoS requires constant-time surrogate feedback that remains efficient as attacks improve.
| Method | Amplification | Stealthiness | Optimizability |
|---|---|---|---|
| Manual Puzzles | |||
| GCG-DoS / Engorgio / Excessive | |||
| AutoDoS / CatAttack | |||
| ICL / POT / ThinkTrap | |||
| ReasoningBomb (Ours) |
ReasoningBomb employs a multi-component architecture trained via reinforcement learning. The attacker model generates PI-DoS attack prompts, a length predictor evaluates their expected effectiveness using constant-time feedback, and a diversity evaluator encourages exploration of varied attack strategies.
A trainable LRM generates attack prompts through reasoning. The model outputs in the format <think> meta-reasoning </think> adversarial puzzle, where meta-reasoning allows explicit deliberation about attack strategies before producing the final prompt.
Provides constant-time reward signals by estimating the expected reasoning trace length a prompt would induce. Rather than expensive autoregressive generation, we extract hidden states from a frozen victim model via a single forward pass, then apply a lightweight MLP to predict the length.
Encourages the attacker to explore varied attack strategies rather than collapsing to repetitive prompts. For each group of prompts, we compute text embeddings and measure pairwise cosine similarities. Prompts dissimilar to others receive higher diversity rewards.
Stage 1 (SFT): Supervised fine-tuning on adversarial prompts satisfying token budget Lin ≤ Lbudget, teaching reliable generation of valid short attack prompts. Stage 2 (RL): GRPO-based reinforcement learning where the attacker maximizes the constant-time surrogate reward combining length prediction and diversity components.
We evaluate ReasoningBomb on 10 victim models: 2 LLMs (DeepSeek-V3, Kimi-K2), 5 open-source LRMs (DeepSeek-R1, MiniMax-M2, Nemotron-3, Qwen3-30B, Qwen3-32B), and 3 commercial LRMs (Claude 4.5, GPT-5, Gemini 3).
ReasoningBomb (256-token budget) induces 18,759 completion tokens on average across all 10 models (LLMs + LRMs), and 19,263 reasoning tokens on average across LRMs, surpassing the runner-up by 35% in completion tokens and 38% in reasoning tokens.
Figure 2. Reasoning token distribution across all victim models. ReasoningBomb induces 6–7× more reasoning tokens than benign queries (SimpleQA: 1,222 tokens, SimpleBench: 4,276 tokens) and consistently outperforms all baselines.
Using only 60 input tokens on average, ReasoningBomb (128-token budget) induces 17,096 completion tokens, achieving 286.69× amplification. Per-model average reaches 301.98× (ranging from 39× to 640×), which is 197% better than the runner-up.
Figure 3. Amplification analysis: ReasoningBomb (green stars) achieves long outputs with short prompts. AutoDoS uses 5,215 input tokens for only 2.22× amplification; ICL uses 1,355 tokens for 5.85×, which are 129× and 49× worse than ours, respectively.
A practical PI-DoS training signal must be both informative (correlates with victim reasoning length) and constant-time (cost does not grow with attack success). Our surrogate reward replaces expensive autoregressive generation with a single forward pass.
Our length predictor achieves strong correlation on the training victim (DeepSeek-R1-32B: Pearson r=0.7485) and transfers meaningfully to different architectures (Qwen3-32B: r=0.5563), with 71–79% pairwise direction accuracy.
Figure 4. Correlation between surrogate predictor and actual victim generation length. Training split: Pearson r=0.7485, direction accuracy 79.4%; validation split: r=0.5635, accuracy 70.9%; transfer to Qwen3-32B: r=0.5563, accuracy 70.8%.
Direct victim feedback grows from 23.31s to 191.26s per query as attacks strengthen. Our surrogate remains constant at 0.19–0.22ms, yielding 4.39×105× cumulative speedup.
Figure 5. Response time comparison. Direct victim feedback: 23.31s→191.26s per query (601.6s total over 7 queries). Our surrogate: 0.19–0.22ms per query (1.37ms total), enabling practical RL optimization.
We evaluate against a strict dual-stage defense combining (1) input-side perplexity filter + LLM-as-judge, and (2) output-side LLM-as-judge on the first 2,000 tokens. ReasoningBomb achieves a 98.4% bypass rate under joint detection.
| Method | Input Det. | Output Det. | Dual-Stage |
|---|---|---|---|
| AutoDoS | 100.0% | 13.3% | 100.0% |
| Engorgio | 100.0% | 0.0% | 100.0% |
| CatAttack | 2.7% | 0.0% | 2.7% |
| ICL | 10.0% | 2.0% | 10.0% |
| ReasoningBomb (Ours) | 0.2% | 1.3% | 1.6% |
Detection rates on Qwen3-32B (lower is better for attacker). Dual-stage combines input and output detection via OR-gate.
We simulate a production inference server (8×NVIDIA A100 node, FCFS queue) to measure how ReasoningBomb attacks impact benign user throughput under mixed traffic.
At just 10% malicious traffic with 32K response cap: benign throughput drops from 8.52 to 4.28 req/min (−49.8%), while attackers monopolize 64.3% of total compute time. Even 1% malicious requests occupy 20.7% of compute.
Figure 6. Real-world server simulation. (a) Benign User Processed (BUP) decreases as malicious ratio increases. (b) Computational Time Occupation (CTO) shows attackers dominate server resources. At 32K cap with 10% malicious traffic: −49.8% throughput, 64.3% compute monopolized.
We discuss several promising directions for defending against PI-DoS attacks on LRMs:
Cache embeddings of known attack prompts to detect near-duplicates and skip expensive prefilling computation.
Proactively discover attack prompts using the ReasoningBomb framework itself, then use them for adversarial fine-tuning.
Pre-compute and cache solutions for known attack patterns to eliminate redundant reasoning computation.
Fine-tune models to produce shorter reasoning traces when processing attack-like prompts without degrading benign performance.
If you find our work useful, please cite our paper:
@misc{liu2026reasoningbombstealthydenialofserviceattack,
title={ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing
Pathologically Long Reasoning in Large Reasoning Models},
author={Xiaogeng Liu and Xinyan Wang and Yechao Zhang and Sanjay Kariyappa
and Chong Xiang and Muhao Chen and G. Edward Suh and Chaowei Xiao},
year={2026},
eprint={2602.00154},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2602.00154},
}