Inception Labs Launches Mercury 2: World's Fastest Reasoning LLM Achieves 1,000+ Tokens Per Second via Diffusion Architecture
Inception Labs has released Mercury 2, a diffusion-based reasoning language model that generates over 1,000 tokens per second on NVIDIA Blackwell GPUs — more than five times faster than leading autoregressive competitors. Unlike traditional sequential decoding, Mercury 2 uses parallel token refinement, enabling reasoning-grade quality within real-time latency budgets at a cost of $0.25 per million input tokens.
