A study introduces a novel framework called Indirect Prompt Gradient Optimization (IPGO) for prompt-level fine-tuning in Text-to-Image Diffusion models.
IPGO enhances prompt embeddings by injecting continuously differentiable tokens at the beginning and end of the prompt embeddings, allowing for gradient-based optimization.
The results show that IPGO consistently outperforms cutting-edge benchmarks in terms of image aesthetics, image-text alignment, and human preferences.
IPGO is effective in enhancing image generation quality while requiring minimal training data and limited computational resources.