<ul><li>Large Language Models (LLMs) have shown high performance on reasoning benchmarks such as GSM8K, MATH, and AIME.</li><li>Model quantization is being used to reduce memory usage and inference time, but it can degrade mathematical reasoning accuracy by up to 69.81%.</li><li>A study has been conducted on mainstream quantization methods and popular open-source models to understand and categorize the errors caused by quantization.</li><li>An automated data-curation pipeline has been developed to create a compact dataset that, when used to train a quantized model, can restore its reasoning accuracy within a few minutes on a single GPU.</li></ul>

InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models

Discover more