Charles Hooper

Thoughts and projects from a site reliability engineer

Briefly: Health Checks

Health checks are specially defined endpoints or routes in your application that allow external monitors to determine the health of your web application. They are so important to production health that I consider them the “13th factor” in 12factor.

If an application is healthy it will return a HTTP 2xx or 3xx status code and when it is not it will return an HTTP 5xx status code.

This type of output allows load balancers to remove unhealthy instances from its rotation but can also be used to alert an operator or even automatically replace the instance.

In order to implement proper health checks, your application’s health checks should:

  1. Return a HTTP 2xx or 3xx status code when healthy

  2. Return a HTTP 5xx status code when not healthy

  3. Include the reason why the check failed in the response body

  4. Log the requests and their results along with Request IDs

  5. Not have any side effects

  6. Be lightweight and fast

If you implement health checks in your application following this advice, you’ll have a more resilient, monitorable, and manageable application.

How about you all? Is there anything you would add?

Comments