A new study introduces a causal representation learning framework to evaluate language model capabilities effectively.
The framework models benchmark performance as a linear transformation of a few latent capability factors.
The latent factors are identified as causally interrelated after controlling for the base model as a common confounder.
The study analyzed over 1500 models across six benchmarks and identified a concise three-node linear causal structure explaining performance variations.
The causal structure revealed insights starting from general problem-solving capabilities, through instruction-following proficiency, to mathematical reasoning ability.
The results emphasize the importance of controlling base model variations during evaluation to accurately uncover causal relationships among latent model capabilities.