The article discusses the acceleration mechanisms and hyperparameter sensitivity in Consistency Large Language Models (CLLMs).
Acceleration mechanisms in CLLMs, particularly the fast-forwarding phenomenon and stationary tokens in Jacobi decoding, are empirically investigated.
Significant improvements of 2.0x to 6.8x are observed in token counts across various datasets, with domain-specific datasets showing more significant enhancements.
Ablation studies reveal the impact of dataset sizes, n-token sequence lengths, and loss designs on CLLMs' performance and speedup gains.
The importance of high-quality Jacobi trajectory datasets for achieving speedup and maintaining generation quality is highlighted.
The use of on-policy GKD is proposed to improve CLLM training efficiency by removing Jacobi trajectory collection overhead.
Results indicate the robustness of CLLMs when trained on pre-training jobs, suggesting potential adaptability for LLM pre-training with enhanced speed and language modeling capabilities.
The findings and proposed methods in the article are available on arXiv under a CC0 1.0 Universal license.