Researchers propose Rank-Insensitive LoRA-based Quantization Error Compensation (RILQ) to understand and address limitations in sub-4-bit scenarios of LoRA-based quantization error compensation (LQEC).
RILQ employs model-wise activation discrepancy loss to adjust adapters cooperatively across layers, enabling robust error compensation with low-rank adapters.
Evaluations on LLaMA-2 and LLaMA-3 demonstrate RILQ's consistent improvements in 2-bit quantized inference across various quantizers and enhanced accuracy in task-specific fine-tuning.
RILQ enables adapter-merged weight-quantized Large Language Model (LLM) inference with significantly enhanced accuracy, making it a promising approach for boosting 2-bit LLM performance.