Enable javascript in your browser for better experience. Need to know to enable it?

÷ÈÓ°Ö±²¥

Building resiliency with chaos engineering

Building resiliency with chaos engineering

A predictable system is a myth. System failures are inevitable but you can be prepared for failures by building resilient systems. We explore chaos engineering as a way to do exactly that.

Ìý

What is chaos engineering?

Chaos engineering or chaos testing is a Site Reliability Engineering (SRE) technique that simulates unexpected system failures to test a system's behavior and recovery plan. Based on what is learned from these tests, organizations design interventions and upgrades to strengthen their technology.

Ìý

Why do we need chaos engineering?

Ìý

Let’s look at an instance where one of our e-commerce customers sees their applications terminating one after another during a Black Friday sale. But, there is no CPU or memory spike. Ultimately, it turns out that writing logs in a file within the container led to running out of disk space.Ìý

Ìý

In the microservices world, it is not uncommon for one slow service to d