Reinforcement Learning Finetuning (RFT) has advanced reasoning capabilities of large language models (LLMs) for better tool use.VTool-R1 is a framework that trains VLMs to generate multimodal chains of thought using both text and visual reasoning steps.VTool-R1 integrates visual editing tools into RFT process, enabling VLMs to learn when and how to use visual reasoning steps.Experiments show that VTool-R1 improves reasoning performance by teaching VLMs to think with images and generate multimodal chain of thoughts.