Microsoft Research has introduced BitNet b1.58 2B4T, a new 2-billion parameter language model that uses only 1.58 bits per weight instead of the usual 16 or 32. Despite its compact size, it matches the performance of full-precision models and runs efficiently on both GPUs and CPUs.
The model was trained on a large dataset containing 4 trillion tokens and performs well across a wide range of tasks, including language understanding, math, coding, and conversation. Microsoft has released the model weights on Hugging Face, along with open-source code for running it.
BitNet b1.58 2B4T achieves performance on par with leading open-weight, full-precision LLMs of similar size, while offering significant advantages in computational efficiency, including substantially reduced memory footprint, energy consumption, and decoding latency.
BitNet b1.58 2B4T represents a meaningful step forward in making AI models more efficient and accessible by dramatically reducing the computational requirements of large language models without compromising performance.