What’s Your Chaos Monkey?

Netflix has automated systems designed to deliberately cause failures in production systems.

They call this system Chaos Monkey and it “randomly takes production servers offline, forcing the system to heal itself or die trying.”

More specifically:

Chaos Monkey is a service which identifies groups of systems and randomly terminates one of the systems in a group. The service operates at a controlled time (does not run on weekends and holidays) and interval (only operates during business hours). In most cases we have designed our applications to continue working when a peer goes offline, but in those special cases we want to make sure there are people around to resolve and learn from any problems. With this in mind Chaos Monkey only runs in business hours with the intent that engineers will be alert and able to respond.

Or more simply:

“Chaos Monkey is a resiliency tool that helps applications tolerate random instance failures.”

What would you life be like if instead of anxiously fearing failure at any moment, you instead set into motion something that would help build your resiliency?

I am not quite sure what this could look like.

Usually, resiliency is a skill you build after failure, but if you have to wait, you might not be ready to bounce back.

Pre-resiliency planning would be finding ways to cope with failure before they happen.

It might include preventative maintence type systems in your life.

Other’s might have some more extreme ideas.

I’m really not sure.

The point is that experimentation in the way you do things can lead to improvements.

Never making changes might seem steady and a fine way to live.

Until it isn’t.

Even if you have a good thing going, the world around you is always up to something.

Others are acting outside your control doing things you may or may not like.

You can change or complain, it’s up to you.

So my question to you is “What’s Your Chaos Monkey?”

BitShares Title Block
