Unicorn synthesizes text-only data for training Vision Language Models (VLMs)Eliminates need for image generation during trainingReduces computational cost by 37x compared to methods using synthetic imagesProves VLMs can learn visual concepts from purely textual data