<ul><li>The study focused on analyzing the impact of deep learning runtime engines and execution providers on energy consumption, execution time, and computing-resource utilization in the context of code Small Language Models (SLMs).</li><li>CUDA execution provider configurations outperformed CPU execution provider configurations in terms of energy consumption and execution time.</li><li>TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations.</li><li>Optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations.</li></ul>

Energy consumption of code small language models serving with runtime engines and execution providers

Discover more