<ul data-eligibleForWebStory="false"><li>Microsoft has introduced Phi-4-mini-flash-reasoning, a compact AI model optimized for fast on-device logical reasoning in low-latency environments like mobile apps and edge deployments, delivering up to 10 times throughput improvement and reduced latency.</li><li>The new model utilizes a 'decoder-hybrid-decoder' architecture called SambaY, incorporating state-space models, sliding window attention, and a novel Gated Memory Unit (GMU) for enhanced decoding efficiency and long-context performance.</li><li>Phi-4-mini-flash-reasoning outperforms larger models on tasks like AIME24/25 and Math500 while maintaining faster response times on the vLLM inference framework, making it suitable for real-time tutoring tools and adaptive learning apps.</li><li>The model aligns with Microsoft’s responsible AI principles, with safety features like supervised fine-tuning and reinforcement learning from human feedback. It is available through Azure AI Foundry, Hugging Face, and the NVIDIA API Catalogue.</li></ul>

New Microsoft AI Model Brings 10x Speed to Reasoning on Edge Devices, Apps

Discover more