Meta AI has developed a machine-based learning library to evaluate text-to-image generative models that includes support for various metrics, datasets and visualizations.
The EvalGIM library also introduces a unique feature called “Evaluation Exercises,” which synthesizes performance insights to answer specific research questions.
Researchers that collaborated on the project are based at Fair at Meta, Mila Quebec AI Institute, Univ. Grenoble Alpes Inria CNRS Grenoble INP, LJK France, McGill University, and Canada CIFAR AI chair.
The library supports real-image datasets, including MS-COCO and GeoDE, offering insights into performance across geographic regions.
Prompt-only datasets PartiPrompts and T2I-Compbench are included to test models across diverse text input scenarios and EvalGIM is compatible with popular tools such as HuggingFace diffusers.
Multiple exercises that are structured around the evaluation process, such as the Trade-offs Exercise, examine how models balance quality, diversity and consistency over time.
Researchers found that consistency metrics showed steady improvement during early training stages, but plateaued after about 450,000 iterations.
The Evaluation Exercises also assessed geographic performance disparities using the GeoDE dataset, showing Southeast Asia and Europe benefited most from advancements in latent diffusion models.
A ranking robustness exercise demonstrated how performance rankings varied depending on the metric and dataset.
Combining original and recaptioned training data improved model performance across datasets.