Microsoft expands its Phi line of language models with two new algorithms optimized for multimodal processing and hardware efficiency.
The first addition is the text-only Phi-4-mini with 3.8 billion parameters, based on the decoder-only transformer architecture, reducing hardware usage and improving processing speed.
The second model, Phi-4-multimodal, has 5.6 billion parameters and can process text, images, audio, and video, outperforming other multimodal models in benchmark tests.
Both Phi-4-mini and Phi-4-multimodal will be made available on Hugging Face under an MIT license.