<ul><li>Decentralized training of large language models faces network communication bottlenecks in pipeline-parallel settings due to frequent intermediate activations exchange.</li><li>Existing activation compression methods like AQ-SGD have memory overhead; TAH-Quant introduced as a framework for activation quantization in pipeline parallelism.</li><li>TAH-Quant features tile-wise quantization, token-level adaptive bit allocation, and Hadamard-based transform for efficient quantization outliers suppression.</li><li>Experimental results show TAH-Quant achieving aggressive activation quantization ratio, end-to-end speedup, matching state-of-the-art methods, and no extra memory overhead.</li></ul>

TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

Discover more