Microsoft has introduced Mu, a compact on-device language model designed to run on Copilot+ PCs' neural processing units.
Mu is built on a Transformer encoder-decoder architecture for efficient inference by separating input and output tokens, achieving over 100 tokens per second.
The 330 million parameter model was trained using Azure's A100 GPUs and incorporates features like dual LayerNorm and rotary positional embeddings for speed and accuracy.
Microsoft applied post-training quantisation, collaborating with hardware partners like AMD, Intel, and Qualcomm to optimize the model for lower-precision formats on edge devices.
Mu, integrated into Windows Settings, facilitates system adjustments via natural language queries, mapping them to system actions for user convenience.
The model performed best on multi-word queries with clear intent, implying the need for sufficient context for accurate interpretation.
Taskspecific fine-tuning enabled Mu to meet performance and latency goals, scaling the training dataset and support for various system settings.
Mu's development builds on prior models Phi and Phi Silica, expected to underpin future AI agents on Windows devices.