Microsoft has introduced Phi-4-mini-Flash-Reasoning, an efficient language model emphasizing long-context reasoning with high inference efficiency, released on Hugging Face.
The model's architecture, SambaY, combines State Space Models and attention layers using Gated Memory Units, significantly reducing latency in long-context scenarios.
Phi-4-mini-Flash-Reasoning excels in complex reasoning tasks, outperforming its predecessor and achieving impressive results on benchmarks like Math500 and AIME24/25.
With a focus on long Chain-of-Thought generation, the model delivers up to 10× higher throughput than its predecessor, showcasing efficient long-context processing and real-time inference capability.