Managing Kubernetes clusters efficiently is critical, especially as their size is growing.
A significant challenge with large clusters is the memory overhead caused by list requests.
To better visualize the memory overhead challenge, a synthetic test was conducted using the kube-apiserver which crashed at approximately 16:40, when serving only 16 informers.
The issue of overflowing memory, leading to resource exhaustion, raises a genuine risk and the need to optimize the API server for better efficiency.
Kubernetes 1.32 has recently announced a major improvement in API streaming technology to address efficiency and stability concerns. With the graduation of the watch list feature to beta, client-go users can opt-in to streaming lists via a special category of watch requests, which reduces the temporary memory footprint significantly.
By streaming each item individually instead of returning the entire collection, the new method maintains a constant memory overhead.
The results showed significant improvements with the watch list feature enabled, as Kubernetes' increased memory consumption stabilized at approximately 2 GB, while memory usage with the feature disabled increased to 20 GB.
API Priority and Fairness has proven to reasonably protect kube-apiserver from CPU overload, however, this has a smaller impact on memory protection. As an increasing number of Kubernetes API requests with large response bodies flood in simultaneously, OOM crashes can be expected without optimization.
To enable API streaming for your component, you need to upgrade to Kubernetes 1.32 and ensure etcd is running on version 3.4.31+ or 3.5.13+, as well as changing your client software to use watch lists.
While the feature is in beta, only core components like kube-controller-manager have it enabled by default. A watch list feature is expected to become available in other core components like kube-scheduler or kubelet, once it becomes generally available.