Genmo has released Mochi 1, an open-source software to generate high-quality videos from text prompts, which can be used to train AI models in robotics and autonomous systems.
Genmo claims that Mochi 1 outperforms Runway’s Gen-3 Alpha, Luma AI’s Dream Machine, Kuaishou’s Kling, Minimax’s Hailuo, and many others in terms of prompt adherence and motion quality.
Under the Apache 2.0 license, Mochi 1 offers users free access to cutting-edge video generation while Mochi 1 HD, a higher-definition version, is expected to launch in 2021.
Genmo’s co-founders aim to make cutting edge AI technology accessible, stating that “it’s really important to democratise this technology and put it in the hands of as many people as possible. That’s one reason we’re open sourcing it.”
The release of Mochi 1 previews several significant advancements in video generation, including high-fidelity motion, strong prompt adherence and precise control over characters, settings and actions in generated videos.
Genmo raised $28.4m in a Series A funding round for Mochi 1’s development, led by NEA and with backing from Essence VC, The House Fund, WndrCo, Gold House Ventures and Eastlink Capital Partners.
Mochi 1, which uses the Asymmetric Diffusion Transformer architecture, comprises the largest ever open-source video generation model, with ten billion parameters, that uses visual reasoning four times more than it uses text to generate a high-quality video.
Though there are limitations to the Mochi 1 preview version, Genmo plans to advance the software to include image-to-video synthesis capabilities, improve model controllability and offer even greater motion fidelity.
Mochi 1’s release offers researchers, developers and product teams new applications in advertising, education and entertainment as well as the ability to generate synthetic data for AI models in robotics and autonomous systems.
Genmo is investing heavily to improve motion quality compared with other models.
Genmo invites users to try Mochi 1 at genmo.ai/play.