AI2 released Molmo, a visual understanding engine, the multimodal AI model aims to match Google's best models while being open-source and free.
Molmo can identify and answer questions ranging from everyday situations to object identification.
Molmo performs on par with the likes of GPT-4o, Gemini 1.5 Pro and Claude-3.5 Sonnet, while being much smaller; at a tenth of their size, it approaches their level of capability with a model that’s a tenth of their models.
AI2 curated and annotated a set of 600,000 images only, compared to billions used in other models.
Molmo produces image descriptions that are conversational and useful, an interesting annotation method is used where people describe the images out loud.
Molmo's dataset and code are completely free and open-source, empowering developers to make AI-powered apps, services and experiences without seeking permission or paying tech giants.
AI giant players are themselves lowering their prices, raising hundreds of millions to cover the cost, while smaller players like AI2 successfully release free open-source models like Molmo.