<ul><li>Reinforcement learning with verifiable rewards (RLVR) has been successful in improving math and coding in large language models.</li><li>Efforts are being made to extend RLVR into real-world domains like forecasting, where outcome-based reinforcement learning is crucial.</li><li>New adaptations of algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, have shown promising results in forecasting accuracy and calibration.</li><li>Refined RLVR methods can potentially transform small-scale language models into economically valuable forecasting tools, with implications for scaling to larger models.</li></ul>

Outcome-based Reinforcement Learning to Predict the Future

Discover more