The theoretical analysis of reinforcement learning frameworks for self-taught reasoner (STaR) in large language models (LLMs) is presented.STaR framework uses reinforcement learning to generate reasoning steps and reduce the dependence on human-labeled data.The analysis provides a theoretical understanding of the effectiveness of reinforcement learning on chain-of-thought (CoT) reasoning and STaR.The framework explores criteria for pre-trained models, policy improvement, convergence, and the robustness of STaR in improving reasoning in LLMs.