California-based startup Inception Labs has introduced Mercury, claimed to be the first commercial-scale diffusion large language model.
Mercury is ten times faster than current models, with an output speed of over 1000 tokens per second on NVIDIA H100 GPUs.
Unlike transformer models, Mercury generates the entire text at the same time using a diffusion process.
Inception Labs' evaluation shows that Mercury outperforms other small models in output speed, and diffusion models have advantages in reasoning and refining output.