DeepSeek R1 uses the Group Relative Policy Optimization (GRPO) approach to continually evaluate and select the best output.Unlike traditional methods with fixed reward functions, DeepSeek R1 dynamically selects reward functions based on the task and goals of the prompt.Multiple reward functions can be used to balance accuracy, speed, compliance, and safety.This approach increases transparency, understanding, and trust in the AI's reasoning decisions.