Prime Intellect has introduced SYNTHETIC-1, an open-source dataset designed to provide verified reasoning traces in math, coding, and science.
SYNTHETIC-1 consists of 1.4 million structured tasks and verifiers, addressing the need for reliable and well-organized data for reasoning models.
The dataset includes math problems with symbolic verifiers, coding problems with unit tests, open-ended STEM questions with LLM evaluation, real-world software engineering tasks, and code output prediction tasks.
SYNTHETIC-1 aims to improve machine reasoning in math, coding, and science and encourages continuous collaboration and expansion for refining AI training resources.