MixDiT is an algorithm-hardware co-designed acceleration solution proposed to address the compute-intensive nature and long latency of Diffusion Transformer (DiT) inferencing.
MixDiT utilizes mixed Microscaling (MX) formats to quantize DiT activation values, selectively applying higher precision to magnitude-based outliers to achieve low-precision quantization with high accuracy and substantial speedup.
A MixDiT accelerator is designed to enable precision-flexible multiplications and efficient MX precision conversions, resulting in a speedup of 2.10-5.32 times over RTX 3090 in experimental results.
MixDiT achieves this speedup without any loss in Fréchet Inception Distance (FID).