menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Cloud News

>

How ‘Helpf...
source image

Hackernoon

1M

read

121

img
dot

Image Credit: Hackernoon

How ‘Helpful’ Retries Can Wreck Your System—and How to Stop It

  • Retrying actions in systems can be problematic if not done carefully, leading to potential system failures.
  • Differentiating between cases where retrying makes sense and where it doesn't is crucial for effective system design.
  • In scenarios where retries are appropriate, they can improve the overall availability of services by mitigating transient failures.
  • The cumulative effect of multiple dependencies can reduce the overall service availability, making retries necessary to maintain high availability.
  • The formula for calculating overall availability with retries involves considering the individual availability of dependencies and the number of retries.
  • While retries enhance availability, they can trigger a 'retry storm' if not controlled, leading to cascading failures and prolonged outages.
  • Strategies like bounded retries, circuit breaking, and retry techniques are essential to limit retries and prevent excessive load on failing services.
  • Techniques such as exponential backoff and TCP congestion control mechanisms can help in managing retries effectively.
  • Implementing guardrails on both client and server sides, like backpressure contracts and load shedding, can further safeguard the system during retries.
  • To build reliable distributed systems, understanding the impact of retries and implementing appropriate strategies is crucial for system resilience and performance.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app