“We guarantee 99.9% uptime.”
It sounds impressive. It sounds like near-perfection. It’s become such a standard claim that most businesses nod along without questioning what it actually means.
Here’s what most people don’t realize: 99.9% uptime allows for over 8 hours of downtime per year. And that might be the least surprising thing hiding in your vendor’s SLA.
The Math Behind the Nines
The difference between uptime percentages isn’t linear - it’s logarithmic. Each additional “nine” represents a 10x improvement in availability. Here’s what each level actually permits:
| Uptime % | Common Name | Downtime/Year | Downtime/Month | Downtime/Day |
|---|---|---|---|---|
| 99% | Two nines | 3.65 days | 7.31 hours | 14.4 minutes |
| 99.9% | Three nines | 8.76 hours | 43.8 minutes | 1.44 minutes |
| 99.99% | Four nines | 52.6 minutes | 4.38 minutes | 8.6 seconds |
| 99.999% | Five nines | 5.26 minutes | 26.3 seconds | 0.86 seconds |
Source: uptime.is
A simple way to remember: five nines allows approximately 5 minutes of downtime per year. Each fewer nine multiplies that by 10.
That “99.9% guarantee” your hosting provider advertises? It means they can be down for 43 minutes every single month and still technically meet their SLA.
What the Major Cloud Providers Actually Promise
Let’s look at what the big three actually guarantee:
Amazon Web Services
AWS’s compute SLA offers different guarantees based on your architecture:
- Multi-AZ deployments: 99.99% (52 minutes/year)
- Single-instance in one AZ: 99.5% (1.83 days/year)
That’s a massive difference. If you’re running a single EC2 instance without redundancy, AWS only promises to be up 98.2% of the time - allowing for nearly 44 hours of downtime per year.
Microsoft Azure
Azure’s VM SLA similarly varies by configuration:
- VMs across Availability Zones: 99.99%
- Single-instance VMs with premium storage: 99.9%
Google Cloud
Google Compute Engine promises 99.99% for instances deployed across multiple zones.
The pattern is clear: high availability SLAs require you to architect for redundancy. A single server with a 99.9% SLA has a very different risk profile than a distributed system with 99.99%.
The Fine Print That Changes Everything
Here’s where SLAs get interesting - and by interesting, I mean concerning.
Exclusions That Void the Guarantee
Most cloud provider SLAs include exclusions that can void your protection entirely. Common exclusions include:
- Force majeure events - Natural disasters, wars, government actions
- Internet access problems - Issues outside the provider’s network
- Customer actions or inactions - Including configuration errors
- Customer equipment or software - Problems in your code or infrastructure
- Scheduled maintenance - If they notify you in advance, it doesn’t count
- Third-party failures - Services they depend on but don’t control
That last one is particularly important. If your website goes down because of a DNS provider failure, your hosting company may argue it wasn’t their fault - even though your customers still couldn’t reach you.
Architecture Requirements
This is the gotcha that catches many businesses: credits only apply if you’ve architected correctly.
If your application goes down because you deployed a single-instance architecture in one availability zone and that zone has an outage, AWS met its SLA for multi-AZ deployments - you just didn’t use it.
The 99.99% guarantee often requires:
- Deployment across multiple availability zones
- Proper load balancing configuration
- Redundant database instances
- Specific storage configurations
Running a simpler architecture? You’re likely covered by a much lower SLA than the headline number suggests.
How Uptime Is Measured
Different providers measure “unavailability” differently:
- Network-level availability - The server is reachable
- Service-level availability - The application responds
- End-user availability - Customers can actually use the service
A provider might measure their SLA at the network level while your users experience application-level problems. The infrastructure is “up” by their definition while your business is effectively down.
Request Requirements
Here’s something many businesses don’t realize: SLA credits aren’t automatic.
Google Cloud requires customers to notify technical support within 60 days and provide log files showing downtime periods with dates and times. Failure to comply forfeits your right to receive credits.
Most providers require you to:
- Detect and document the outage yourself
- File a support ticket within a specified window
- Provide evidence of the downtime
- Wait for the provider to validate your claim
If you don’t have monitoring in place to detect and document outages, you may never know you were eligible for credits.
The Compensation Gap: Why SLA Credits Don’t Cover Your Losses
Let’s talk about what happens when the SLA is breached.
The Math of SLA Credits
Here’s a realistic scenario: Your business uses a small AWS instance costing $3 per month. The instance goes down for 6 hours due to a provider issue.
Since the monthly uptime is still above 99%, you receive 10% of your monthly bill as credit - approximately 30 cents.
Meanwhile, those 6 hours may have cost your business thousands in lost sales, damaged reputation, and customer support overhead.
Even in a worst-case scenario where the resource is down for more than 36 hours, you’d only receive a full refund of that resource’s monthly cost - still nothing compared to actual business losses.
Real-World Disparity
One analysis found a case where a SaaS provider offered $3,200 in service credits for an outage that caused over $2 million in actual customer losses - roughly 0.15% of the real impact.
This isn’t an anomaly. With downtime costing large businesses an average of $9,000 per minute, while SLA credits are capped at monthly subscription fees, the gap between compensation and losses is inherent to how SLAs are structured.
The “Sole Remedy” Clause
Most enterprise SLAs include language making credits your “sole and exclusive remedy” for any unavailability. This limits your legal recourse and ensures you can’t seek damages beyond the credit amount.
The SLA isn’t designed to make you whole after an outage. It’s designed to incentivize the provider to maintain service and serve as a marketing signal that they take reliability seriously.
The 100% Uptime Myth
Some providers advertise “100% uptime guarantees.” Be skeptical.
One analysis found a SaaS provider promising “100% uptime” whose SLA only provided compensation after 0.05% downtime per month - more than 20 minutes of allowed downtime.
True 100% uptime is practically impossible. Even the most reliable systems experience occasional issues. A provider claiming 100% is either:
- Using creative definitions of “uptime”
- Hiding the real terms in fine print
- Making a promise they can’t keep
What Actually Matters When Evaluating SLAs
Given all these caveats, here’s how to actually evaluate vendor reliability:
1. Look at Track Record, Not Just Promises
Historical uptime data matters more than SLA promises. Ask vendors for:
- Actual uptime statistics over the past 12-24 months
- Incident history and post-mortems
- Status page transparency
A vendor with a 99.9% SLA who has actually delivered 99.99% is better than one promising 99.99% with a history of outages.
2. Understand the Architecture Requirements
Ask specifically:
- What architecture is required to qualify for the headline SLA?
- What’s the SLA for single-region or single-zone deployments?
- What configurations void the guarantee?
3. Read the Exclusions
Identify what’s explicitly excluded:
- Scheduled maintenance windows
- Third-party dependencies
- “Customer-caused” issues
- Force majeure
4. Understand the Claim Process
Know before you need it:
- How long do you have to file a claim?
- What documentation is required?
- How is “downtime” measured?
5. Calculate Your Actual Risk
Do the math for your business:
- How much does an hour of downtime cost you?
- How does that compare to maximum SLA credits?
- What’s your risk exposure beyond what’s covered?
Building Your Own Safety Net
Given that SLAs are marketing tools more than insurance policies, smart businesses build their own protection:
Monitor Independently
Don’t rely on your provider’s status page to know when there’s a problem. External monitoring from multiple locations gives you:
- Early warning of issues
- Documentation for SLA claims
- Data your provider might not report
Architect for Failure
Assume things will break. Design systems that:
- Span multiple availability zones
- Have automated failover
- Can operate in degraded mode
- Recover quickly from failures
Have a Backup Plan
For critical services, consider:
- Multi-cloud redundancy for essential systems
- Geographic distribution across providers
- Documented procedures for provider failures
Document Everything
When outages occur:
- Log exact start and end times
- Screenshot error messages and status pages
- Save any communication from the provider
- Calculate actual business impact
The Bottom Line
An SLA is not insurance. It’s not a guarantee that you won’t experience downtime. It’s a baseline commitment with significant limitations, exclusions, and caps on compensation.
The 99.9% uptime guarantee that sounds impressive allows for 8+ hours of annual downtime, may not cover single-server deployments, excludes many common failure scenarios, and compensates you with pennies when breached.
Smart businesses treat SLAs as one data point among many when evaluating vendors - not as protection against the real cost of downtime.
The only reliable protection against downtime impact is your own preparation: independent monitoring, resilient architecture, and business continuity planning that doesn’t depend on vendor compensation.
Want to know when your services are actually down - regardless of what your vendor’s status page says? FlareWarden monitors your infrastructure from outside your network and alerts you immediately, giving you the documentation you need for SLA claims and the early warning you need to respond.