A new method for language model pre-training, called Actor-Critic based Online Data Mixing (AC-ODM), has been developed.AC-ODM captures varying domain weights using auxiliary actor-critic networks and considers intra-domain interactions with a reward function.It applies the actor trained with a small proxy Language Model as the environment for data sampling strategy.Numerical results show that AC-ODM-410M performs significantly better in convergence and accuracy compared to existing methods.