The study focused on analyzing the impact of deep learning runtime engines and execution providers on energy consumption, execution time, and computing-resource utilization in the context of code Small Language Models (SLMs).
CUDA execution provider configurations outperformed CPU execution provider configurations in terms of energy consumption and execution time.
TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations.
Optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations.