A new paper on arXiv proposes a two-stage diffusion transformer-based pipeline to improve text rendering controllability over typography and style.
They introduce typography control fine-tuning (TC-FT) and a text-agnostic style control adapter (SCA) to address font inconsistency and style variation challenges.
The proposed approach focuses on precise word-level application of typographic features and enhanced style consistency in text rendering tasks.
The paper incorporates HTML-render into the data synthesis pipeline and provides a word-level controllable dataset for academic use.