Efficient on-device neural network (NN) inference has various advantages over cloud-based processing, including predictable latency, enhanced privacy, greater reliability, and reduced operating costs for vendors.
This paper presents the first comparative evaluation and independent benchmarks of commercially-available microcontroller-scale neural processing units (µNPUs) designed for ultra-low-power applications.
A model compilation framework is developed and open-sourced to enable consistent benchmarking of quantized models across diverse µNPU hardware. The benchmark includes factors such as model inference latency, power consumption, and memory overhead.
The analysis reveals expected performance trends and surprising disparities between hardware specifications and actual performance, including unexpected scaling behaviors with increasing model complexity. This provides valuable insights for hardware designers and software developers in the rapidly evolving space of µNPUs.