Large language models (LLMs) are being deployed on mobile devices, but limited DRAM capacity constrains the model size.ActiveFlow is introduced as an LLM inference framework that enables adaptive DRAM usage for modern LLMs.ActiveFlow utilizes novel techniques such as cross-layer active weights preloading and sparsity-aware self-distillation.The framework achieves the performance-cost Pareto frontier compared to existing optimization methods.