Image tokenizers are essential for autoregressive transformer-based image generation, mapping images to sequences of discrete tokens.
The proposed spectral image tokenizer in this paper tokenizes the image spectrum obtained from a discrete wavelet transform.
Advantages of the spectral image tokenizer include leveraging the compressibility of natural images at high frequencies and enabling image reconstruction at different resolutions without retraining.
The tokenizer improves conditioning for next-token prediction compared to traditional approaches and enables partial decoding for coarse image reconstruction.
It also allows autoregressive models to be utilized for image upsampling, providing versatility in image manipulation tasks.
Evaluation of the tokenizer includes reconstruction metrics, multiscale image generation, text-guided image upsampling, and editing.