OpenAI has released a research report on fixing reward hacking in reasoning models.The report explores strategies to monitor and mitigate reward hacking behaviors.OpenAI demonstrates how to monitor models for reward hacking using chain-of-thoughts observation.When monitoring chain-of-thoughts, OpenAI recalled 95% of the hacks, compared to 60% with action-only monitoring.