Skip to main content

On-Call That Doesn't Burn Out Your Team: Building Sustainable Incident Response

47% of DevOps engineers say on-call overload contributes directly to burnout. Learn how Google limits pages to 2 per shift, why companies pay $1,000/week for on-call, and how to build rotations that protect both uptime and your team.

FlareWarden Team
10 min read

The page comes in at 3 AM. You fumble for your phone, squinting at the screen. By the time you’ve diagnosed the issue, mitigated the problem, and documented what happened, it’s 5 AM. You try to sleep for another hour before your alarm goes off.

This isn’t a rare occurrence. It’s Tuesday.

For too many engineering teams, this is the reality of on-call work. And the data shows it’s unsustainable: 47% of DevOps engineers say on-call overload contributes directly to burnout or frustration.

But on-call doesn’t have to be this way. The best organizations have figured out how to maintain reliable systems without destroying their teams in the process.

The Burnout Reality

The numbers paint a stark picture of engineering burnout in 2024-2025:

On-call is a major contributor to this burnout. The “always-on” culture prevalent in DevOps fosters an environment where downtime and recovery are deprioritized.

The Hidden Cost: Attrition

Burned-out engineers leave. And replacing them is expensive.

According to research on IT work-life balance, unspoken frustrations about punishing on-call schedules contribute directly to employee attrition. In major tech hubs like London and San Francisco, it can cost $300,000 or more to replace a single engineer.

The same research found:

  • 23.1% of IT professionals said poor work-life balance prompted them to consider leaving
  • One in four said the “always-on” nature of their work makes their jobs unmanageable
  • 56.7% accept disrupted sleep and poor work-life balance as “just part of the job”

Perhaps most concerning: 72% of respondents said their management team has little to no visibility into how on-call work negatively affects employees’ personal lives.

The Google Standard: 2 Pages Per Shift

If anyone has figured out sustainable on-call at scale, it’s Google. Their Site Reliability Engineering (SRE) practices have become the industry standard for good reason.

The Page Budget

Google has established a clear limit: maximum 2 paging incidents per 12-hour on-call shift.

The reasoning is mathematical. Google has found that on average, dealing with the tasks involved in an on-call incident - root-cause analysis, remediation, and follow-up activities like writing a postmortem and fixing bugs - takes 6 hours. Two incidents per 12-hour shift is the sustainable maximum.

The ideal? A median of zero pages on most shifts. If a given component causes pages every day, it’s likely that something else will break at some point, causing more incidents than should be permitted.

What Happens When You Exceed the Budget

Google’s SRE book describes a cautionary tale: a team regularly receiving five paging incidents per shift instead of their budget of two.

The consequences were predictable:

Sound familiar? Many teams live this reality without recognizing it as a systemic failure.

The 50/25/25 Rule

Google’s approach to SRE work allocation provides another useful framework:

ActivityTarget % of Time
Engineering/project workAt least 50%
On-call dutiesNo more than 25%
Other operational workUp to 25%

Source: Google SRE Book

The key insight: SRE work should be a healthy mix of duties, with at least half of time spent on project work. On-call should never become someone’s entire job.

The Science of Sleep and Incident Response

On-call work doesn’t just affect morale - it directly impacts incident response quality.

Sleep Inertia: The 3 AM Liability

When a page wakes you at 3 AM, you’re not operating at full capacity. Sleep inertia is a transition state that occurs upon waking in which alertness and cognitive performance are temporarily degraded.

Research shows:

The implications for incident response are clear: the person responding to your 3 AM outage is cognitively impaired. Building systems that reduce the need for 3 AM responses isn’t just humane - it produces better outcomes.

The Anxiety Effect

Even when the pager doesn’t go off, being on-call affects sleep. Studies have found that on-call workers frequently experience difficulties getting to sleep as well as reduced sleep quality and quantity, sometimes even in the absence of a call.

Research examining how anxiety about missing alarms affects on-call workers found that pre-bed anxiety was increased during on-call conditions, and total sleep time was shorter with lower sleep efficiency compared to control conditions.

Building Sustainable On-Call

The path to sustainable on-call involves structural changes, not just individual coping strategies.

1. Limit Shift Length

An on-call rotation that has to handle one or more pages per day must be structured in a sustainable way, with recommended shift lengths limited to 12 hours. Shorter shifts are better for mental health.

Team members run the risk of exhaustion when shifts run long, and when people are tired, they make mistakes.

2. Avoid Consecutive Shifts

Continuous on-call shifts without sufficient breaks can negatively impact mental and physical health. Organizations should:

  • Limit consecutive on-call days
  • Avoid scheduling employees for back-to-back shifts
  • Ensure adequate recovery time between rotations

3. Eliminate Night Shifts with Follow-the-Sun

For globally distributed teams, a multi-site “follow the sun” rotation allows teams to avoid night shifts altogether.

The model works by having teams in different time zones hand off coverage:

RegionCoverage Window (Local)
Americas8 AM - 4 PM
Europe4 PM - 12 AM (Americas time)
Asia-Pacific12 AM - 8 AM (Americas time)

Each team works during their normal business hours while providing 24/7 coverage globally.

Night shifts are demanding and draining and can negatively impact employee health and well-being. Follow-the-sun eliminates this entirely for teams with global presence.

4. Involve Engineers in Scheduling

Forcing people to go on-call without them contributing to the schedule will not work out well for employee wellbeing and productivity. Involving engineers in the process:

  • Keeps scheduling transparent
  • Allows everyone to provide feedback
  • Accommodates personal commitments and preferences
  • Creates buy-in and shared ownership

5. Enable Manager Support and Recovery

Hands-on managers can recognize when a responder’s on-call experience has been particularly stressful. Managers should:

  • Understand what a typical shift looks like
  • Recognize when someone has carried an unusually large burden
  • Encourage and enable time off to recover after difficult shifts

Good management visibility prevents the 72% problem where leadership has no idea how on-call is affecting their teams.

The Compensation Question

One indicator of whether an organization takes on-call seriously: do they compensate for it?

Companies That Pay

Companies like Google, Intercom, Spotify, LaunchDarkly, CircleCI, and PayPal compensate at or above $1,000 USD per week for on-call responsibilities.

According to The Pragmatic Engineer’s analysis:

  • Major tech companies often pay $600-$1,000+ USD per week
  • Well-funded startups typically offer $400-$800 USD per week
  • UK companies show a higher propensity for on-call compensation than US companies

Google’s Model

Google’s compensation model is particularly thoughtful. For any hour outside of 08:00-18:00 your local time where you are on-call:

  • If your response SLA is 30-60 minutes or less: 1/3 time-in-lieu
  • If your response SLA is 5 minutes or less: 2/3 time-in-lieu
  • Time-in-lieu can be used for vacation or paid out quarterly
  • Hard cap of 80 hours per quarter

This can result in 8 additional weeks of vacation or pay per year.

Why Compensation Matters

Companies which care about healthy on-call practices or want to minimize attrition make it clear on-call is additional work and offer some sort of compensation.

Compensation can take various forms:

  • Cash payments
  • Time off in lieu
  • Lightening the load with dedicated SRE staff
  • Making rotations voluntary

The specific model matters less than the signal it sends: this work is valued and recognized as a burden beyond normal responsibilities.

Reduce the Pages, Not Just the Pain

The best on-call improvement is fewer pages. Every page that doesn’t need to happen is stress avoided.

Fix Alert Fatigue

Alert fatigue occurs when engineers are overwhelmed by too many alerts and can’t properly triage and respond. Teams must:

  • Identify what alerts genuinely need immediate human response
  • Determine what can be automated
  • Decide what can wait until morning versus requiring a 3 AM page

Some teams report their on-call engineer gets paged about a hundred times in a typical 24-hour shift, with many pages getting ignored while real problems get buried. This is a system failure, not an individual one.

Invest in Reliability

Every recurring page represents an opportunity for permanent improvement. What happens after an incident defines culture more than the incident itself. Postmortems should identify not just what failed, but why - and drive lasting fixes.

If the same issue keeps paging, that’s not bad luck. It’s technical debt demanding attention.

Automate Self-Healing

Many pages are for issues that have known, automatable solutions. Invest in:

  • Auto-scaling for capacity issues
  • Automatic restarts for known failure modes
  • Self-healing infrastructure
  • Automated runbooks for common problems

The 2025 State of DevOps Report noted that 57% of SREs still spend more than half their week on toil. Much of this toil represents automation opportunities.

The Sustainable On-Call Checklist

Use this checklist to evaluate your on-call practices:

Structural Health

  • Shifts are 12 hours or less
  • No back-to-back shifts for the same person
  • Adequate recovery time between rotations
  • Engineers have input into scheduling
  • Night shifts are avoided or shared fairly

Page Health

  • Average 2 or fewer pages per shift
  • Median pages per shift is near zero
  • Recurring pages get permanently fixed
  • Alerts are regularly reviewed and pruned
  • Non-urgent issues don’t page after hours

Cultural Health

  • On-call burden is visible to management
  • Difficult shifts are followed by recovery time
  • Compensation or recognition exists for on-call work
  • Postmortems focus on systems, not blame
  • Engineers can raise concerns without retaliation

Work Balance

  • On-call is no more than 25% of an engineer’s time
  • At least 50% of time is spent on project work
  • On-call engineers aren’t also expected to hit sprint goals
  • Mental health resources are available

The Long Game

Sustainable on-call isn’t about making an unsustainable situation slightly more bearable. It’s about building systems, processes, and culture that support both reliability and the humans who maintain it.

The organizations that get this right don’t view it as a trade-off. They understand that burned-out engineers make mistakes, miss problems, and eventually leave. Sustainable on-call isn’t just humane - it’s the foundation of sustainable reliability.

The pager will always exist. But it doesn’t have to be a source of dread. With thoughtful design, appropriate limits, and genuine organizational support, on-call can be a manageable part of engineering work rather than the thing that drives your best people away.


Effective alerting is the foundation of sustainable on-call. FlareWarden monitors from multiple global locations and validates issues before alerting, so your team only gets notified about real problems - reducing noise and protecting your engineers’ sleep.