French AI startup Pleias has released two small reasoning models optimized for retrieval-augmented generation, citation synthesis, and structured multilingual output.
The models, Pleias-RAG-350M and Pleias-RAG-1B, are based on the Pleias 1.0 family of language models and available in CPU-optimized GGUF format.
These models aim to provide cost-effective alternatives to large-scale language models without compromising traceability, multilingual capabilities, or structured reasoning workflows.
Pleias positions their design choice of built-in source citation as an ethical imperative that aligns with regulatory demands for explainable AI.
The Pleias-RAG models can autonomously assess queries, determine complexity, and decide on responses based on source adequacy, offering structured and reasoned answers.
Despite their small size, the Pleias-RAG models exhibit behavior associated with larger systems, showcasing efficient performance on standard CPUs.
In benchmark evaluations, these models outperform larger models on tasks like HotPotQA and show strength in multilingual scenarios with minimal performance degradation.
The models' multilingual support is achieved through careful tokenizer design and adversarial training exercises for language-switching.
Pleias envisions their models being used to augment the performance of existing AI models in orchestration settings, highlighting their cost-effectiveness and complementarity.
The models are released under the Apache 2.0 license, emphasizing commercial reuse and integration into various systems and applications.