A recent research introduced BackSlash, a compression algorithm for large language models (LLMs), emphasizing the importance of the statistical distribution of model parameters on model performance.
The research found that pre-trained LLM parameters follow generalized Gaussian distributions (GGDs) better, leading to the proposal of an end-to-end framework for LLM optimization based on the GG model.
The proposed framework includes a GG-based initialization scheme, a post-training regularization method called DeepShape, and a hardware-efficient 8-bit floating-point format called RF8 for training models with GG-distributed-initialized BackSlash, resulting in smaller and faster models with maintained or improved performance.
Experiments across various model architectures demonstrated that the proposed framework consistently produced more efficient models compared to standard training baselines, offering a path towards efficient, scalable, and hardware-aware AI systems.