When you can’t serve exactly what the user wanted, how should graceful degradation count against your SLOs? Two approaches, and the key mistake to avoid.
Today, I believe we cannot successfully answer several key questions about SRE. Let’s start with the most important one: how can we understand what reliability customers want and need?