Quantization involves the conversion of continuous numerical values, such as those found in the parameters and activations of neural networks, into discrete representations.
Quantization helps mapping a broad range of real numbers onto a smaller set of discrete values.
Neural networks often comprise millions to billions of parameters, making them computationally expensive to train, deploy, and execute, particularly on resource constrained devices.
Quantizing neural network parameters, we can dramatically reduce their memory requirements and computational overhead associated with these models.
Quantization can be classified into two main types: uniform and non-uniform. Uniform quantization involves dividing the input space into evenly spaced intervals, while non-uniform quantization allows for more flexible mappings.
Quantization can target different levels including weights, activations, or the entire network.
Post-Training Quantization (PTQ) quantizes the neural network after it has been trained, while Quantization-Aware Training (QAT) integrates quantization into the training process itself.
Quantization-aware training tends to yield better results in terms of accuracy retention as it simulates the effects of quantization during training, allowing better adaption of the model to constraints.
Quantization represents a critical advancement in the field of artificial intelligence, enabling the widespread adoption of AI in diverse real-world applications, driving innovation and progress in the field.
Ongoing research aims to mitigate the accuracy loss associated with quantization and strike a balance between precision and efficiency.