Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale.Improving reward modeling (RM) with more inference compute for general queries is investigated.Self-Principled Critique Tuning (SPCT) is proposed to foster scalable reward generation behaviors in generative reward modeling (GRM).The study shows that SPCT improves the quality and scalability of GRMs, outperforming existing methods and models.