<ul data-eligibleForWebStory="true"><li>Inception Labs has launched Mercury, claiming it to be the fastest commercial-scale diffusion large language model (LLM), with a speed surpassing Gemini 2.5 Flash and comparable to GPT-4.1 Nano and Claude 3.5 Haiku.</li><li>Mercury can be accessed on chat.inceptionlabs.ai and third-party platforms, offering over 700 tokens per second output speed. It uses a diffusion architecture for high-speed output, different from traditional models.</li><li>The model is available via a first-party API at a cost of $0.25 to $1 per million input/output tokens. Inception Labs announced Mercury in February and recently published a technical report for the model.</li><li>Google introduced diffusion models as a better alternative to traditional language models, allowing quick iterations and error correction during generation. Mercury aims to provide real-time responsiveness to chat applications.</li></ul>

The ‘Fastest Commercial-Grade’ Diffusion LLM is Available Now

Discover more