DeepSeek-GRM introduces a new approach to reward modeling for large language modelsUses Self-Principled Critique Tuning (SPCT) to improve inference-time scalabilityGenerates principles and critiques adaptively for better reward signalsOutperforms existing methods across various benchmarks without severe biases