menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Devops News

>

The Ripple...
source image

Dev

5d

read

229

img
dot

Image Credit: Dev

The Ripple Effect: How a Single Push Notification Brought Down Our Kubernetes Cluster

  • A simple push notification exposed the fragility of Kubernetes infrastructure when it sent a notification to the user base. This resulted in traffic exploding by 12x on some services, nodes CPU utilization went from 45% to 95%, and pods were evicted faster than they could be stabilized.
  • The team targeting rapid scaling capability, resource efficiency, reliability and cost optimization pushed the infrastructure to new levels by redesigning their infrastructure and initial platform setup.
  • The team redesigned EKS control plane architecture and implemented a robust Multi-AZ Configuration along with creating a dedicated VPC for cluster operations, implementing private API endpoints, optimizing CNI settings, and implementing security measures.
  • By tackling bottlenecks and issues with CNI configurations, suboptimal route tables, DNS resolution and analyzing kubelet startup procedures, container runtime configurations, and node initialization scripts, dramatic improvements were seen in node boot times, CNI setup, image pull times and pod scheduling times.
  • Karpenter and KEDA implementations further accelerated the node provisioning time, scale-up decisions and resource utilization.
  • Today, the team's platform runs with newfound confidence with average node provisioning time, p95 pod scheduling latency, resource utilization, and platform availability metrics reflecting transformation.
  • In Kubernetes, every setting, limit, and policy creates its own ripple effect. Understanding and harnessing them is key to success.
  • Future directions include exploring component-level analysis, performance optimization techniques, and testing methodologies to catch problems before production.
  • Team seeks to learn from others' hard lessons by sharing their experiences in the comments section of the post.

Read Full Article

like

13 Likes

For uninterrupted reading, download the app