Inception Labs has launched Mercury, claiming it to be the fastest commercial-scale diffusion large language model (LLM), with a speed surpassing Gemini 2.5 Flash and comparable to GPT-4.1 Nano and Claude 3.5 Haiku.
Mercury can be accessed on chat.inceptionlabs.ai and third-party platforms, offering over 700 tokens per second output speed. It uses a diffusion architecture for high-speed output, different from traditional models.
The model is available via a first-party API at a cost of $0.25 to $1 per million input/output tokens. Inception Labs announced Mercury in February and recently published a technical report for the model.
Google introduced diffusion models as a better alternative to traditional language models, allowing quick iterations and error correction during generation. Mercury aims to provide real-time responsiveness to chat applications.