Decentralized training of large language models faces network communication bottlenecks in pipeline-parallel settings due to frequent intermediate activations exchange.
Existing activation compression methods like AQ-SGD have memory overhead; TAH-Quant introduced as a framework for activation quantization in pipeline parallelism.
TAH-Quant features tile-wise quantization, token-level adaptive bit allocation, and Hadamard-based transform for efficient quantization outliers suppression.
Experimental results show TAH-Quant achieving aggressive activation quantization ratio, end-to-end speedup, matching state-of-the-art methods, and no extra memory overhead.