Zero-shot customization in generative AI involves training personalized models like low-rank adaptation (LoRA) models using personal photos to include user identity in generative outputs.
Customization techniques emerged post the advent of Stable Diffusion, with projects like DreamBooth offering high-gigabyte models, later replaced by lighter and more cost-effective LoRA models.
Zero-shot customization approaches aim to simplify the process by allowing the system to interpret user-provided photos for personalized outputs without extensive training.
HyperLoRA introduces a unique method, generating LoRA code on-the-fly for zero-shot personalized portrait generation with high photorealism and editability.
The training process of HyperLoRA includes isolating specific information in learned weights to prevent identity-relevant features from being influenced by irrelevant elements.
A three-stage training procedure in HyperLoRA includes learning Base-LoRA, followed by introducing ID-LoRA for encoding facial identity structures using CLIP Vision Transformer and AntelopeV2 encoder.
HyperLoRA utilizes a phased structure to disentangle identity and non-identity features, enhancing fidelity and editability in personalized image generation.
The system employs CLIP ViT and AntelopeV2 to extract structural and identity-specific features, passing them through resamplers to generate full LoRA weights on-the-fly.
HyperLoRA's training utilized 4.4 million face images, leveraging PyTorch and Diffusers on NVIDIA A100 GPUs for ten days, demonstrating improved fidelity and editability compared to other methods.
Despite significant hardware demands, HyperLoRA offers promise in managing ad hoc customization efficiently, addressing challenges in zero-shot customization in generative AI.