UniF$^2$ace is a unified multimodal model tailored for fine-grained face understanding and generation, addressing the limitations of existing research in the face domain.
The model is trained on a specialized dataset, UniF$^2$ace-130K, containing image-text pairs and question-answering pairs to cover a wide range of facial attributes.
UniF$^2$ace incorporates diffusion techniques and a mixture-of-experts architecture to optimize both understanding and generation capabilities, surpassing existing UMMs and generative models.
Extensive experiments on UniF$^2$ace-130K demonstrate the model's superior performance in handling fine-grained facial attributes for both understanding and generation tasks.