M-Prometheus is a suite of multilingual LLM judges designed for evaluating text in multiple languages.
Existing LLM judges work well for English but poorly for other languages, creating an unfair evaluation situation for AI systems in non-English languages.
M-Prometheus models range from 3B to 14B parameters and outperform existing open LLM judges.
Key factors for the success of M-Prometheus include proper backbone model selection and the utilization of native multilingual data.