<ul><li>Zyphra Technologies has released Zyda-2, an open pretraining dataset comprising 5 trillion tokens.</li><li>Zyda-2 has been distilled to retain the strengths of existing datasets while eliminating weaknesses.</li><li>Zamba2 small language model trained on Zyda-2 performs significantly better than other state-of-the-art language modeling datasets.</li><li>The dataset aims to help enterprises train high-accuracy small language models for edge and consumer devices.</li></ul>

Zyphra’s new Zyda-2 dataset lets enterprises train small LLMs with high accuracy

Discover more