Here’s a paradox: organizations are spending more than ever on monitoring, yet 82% of teams take over an hour to resolve production incidents - up from 74% in 2023, 64% in 2022, and 47% in 2021.
More monitoring tools. Worse outcomes.
The problem isn’t lack of data. Organizations have realized that nearly 70% of collected observability data is unnecessary, leading to inflated costs without improved visibility.
The real issue? Monitoring done wrong creates blind spots and noise that make problems harder to find, not easier. These are the anti-patterns that plague even sophisticated engineering teams.
Anti-Pattern #1: The Green Dashboard, Angry Customers Problem
This happens when monitoring only checks internal systems - servers running, databases responding, services healthy - without validating what customers actually experience.
Why Internal-Only Monitoring Fails
Internal monitoring suffers from “network blindness” - it can’t detect issues beyond your infrastructure. A real example: a client’s monitoring showed all green internally - perfect CPU usage, healthy memory levels. But customers couldn’t access the website. Why? A DNS issue that only external monitoring could catch.
Common blind spots:
- DNS failures - Your servers are up, but nobody can find them
- CDN issues - Origin is healthy, but edge servers aren’t
- SSL problems - Certificate expired or misconfigured
- Third-party failures - Payment processor down, analytics broken
- Regional outages - Works from your office, broken in Europe
The Fix
Internal monitoring tells you why something is breaking. External monitoring tells you what your users are experiencing. You need both.
Add synthetic monitoring from multiple geographic locations that tests your application the way customers access it - through DNS, CDNs, and the public internet.
Anti-Pattern #2: Alert Fatigue Factory
When everything alerts, nothing does.
A survey by FireEye found that 37% of C-level security executives receive more than 10,000 alerts each month. Of those alerts, 52% were false positives and 64% were redundant.
How This Happens
Teams create alerts for everything “just in case.” CPU over 70%? Alert. Memory over 60%? Alert. Any error in the logs? Alert.
The result: too many false positives create a cacophony of alerts, making it difficult to focus on what truly matters. This “noise effect” causes even vigilant teams to miss critical issues.
One organization migrated from a system that sent roughly 10,000 alerts per month. Most were false positives due to inflexible configuration, resulting in teams being barraged with alerts they knew were probably meaningless.
The Fix
Ask for every alert: “What action does this require?”
If there’s no clear action, it’s not an alert - it’s a log entry or a dashboard metric. Implement:
- Alert deduplication - Multiple tools alerting on the same issue? Consolidate
- Smart thresholds - Use percentiles and baselines, not arbitrary numbers
- Duration requirements - Brief CPU spikes aren’t emergencies
- Severity levels - Not everything deserves a 3 AM page
Anti-Pattern #3: Monitoring the Wrong Metrics
Your CPU usage is 40%. Memory is at 65%. Disk I/O looks normal.
Your customers still can’t complete checkout.
System Metrics vs. User Experience
System metrics like CPU usage, memory usage, disk I/O, and network traffic tell you about infrastructure health. They don’t tell you if users are having a good experience.
The disconnect: your database might be responding in 5ms (great!), but if the API layer has a bug that causes 10-second timeouts for certain queries, customers suffer while your dashboards show green.
Vanity Metrics in Disguise
Vanity metrics look good on paper but don’t help inform future business strategies. In monitoring, common vanity metrics include:
- Uptime percentage (without defining what “up” means)
- Total requests served (without error rates)
- Average response time (hiding tail latency)
If a metric doesn’t change how you run the business, it doesn’t matter.
The Fix
Focus on the Golden Signals:
- Latency - How long requests take
- Traffic - How much demand you’re handling
- Errors - Rate of failed requests
- Saturation - How “full” your service is
These directly reflect user experience, not just infrastructure state.
Anti-Pattern #4: The Averages Lie
Your average API response time is 50ms. Looks great!
But some requests are taking 2,500ms - and those users are having a terrible experience.
Why Averages Hide Problems
Average latency can hide significant outliers - those requests that take much longer than average. These outliers, reflected in P99 latency, significantly impact user experience and indicate underlying issues not apparent in averages.
Consider: if 99 requests complete in 10ms and one request takes 10 seconds, the average is ~110ms. That looks acceptable. But one in every hundred users is having a 10-second experience.
Tail latency is often where systemic bottlenecks and rare bugs surface. A service’s mean latency might look stable while P99 is spiking due to database locks, cache evictions, or garbage collection pauses.
The Fix
Monitor and alert on percentiles, not just averages. Use:
- P50 - Median experience (typical user)
- P95 - 95th percentile (bad experience threshold)
- P99 - 99th percentile (worst-case scenarios)
Mean should not be used as your primary latency SLI in skewed distributions.
Anti-Pattern #5: Invisible Dependencies
Your application works perfectly. Then, at 3 AM, users can’t click the “Complete Purchase” button.
Why? A CDN hosting icons used by a JavaScript library in the payment interface went down. Nobody knew they depended on that CDN - it wasn’t in any architecture diagram or runbook.
The Dependency Blind Spot
Studies show that 30-40% of SLA violations stem from external dependency failures. But most organizations don’t monitor their third-party dependencies systematically.
Most companies know their first-degree dependencies, but not their second or third-degree ones. You know you use AWS, but do you know which services AWS depends on? You know your SaaS tool is critical, but do you know what CDN they use?
The Fix
- Map dependencies honestly - Document not just vendors, but vendors’ vendors
- Monitor critical third parties - Payment processors, auth services, CDNs
- Subscribe to status pages - Get notified when dependencies have issues
- Test graceful degradation - What happens when a dependency fails?
Actively monitor all vendors that directly impact customer experience - typically 10-30 services covering payment processors, cloud providers, and critical SaaS applications.
Anti-Pattern #6: Tool Sprawl Chaos
How many monitoring tools does your organization use?
Over half (52%) of companies use more than six observability tools, with 11% using more than 16. A separate survey found 39% of respondents juggling 11 to 30 monitoring tools.
Why This Happens
Every innovation brings specialized monitoring tools. Containers need container monitoring. Kubernetes needs Kubernetes monitoring. Different teams adopt their preferred tools. Soon you have overlapping solutions everywhere.
The Consequences
- Context switching - Engineers lose time flipping between interfaces
- Data silos - Logs, metrics, and traces live in different systems
- Redundant alerts - Multiple tools firing for the same incident
- Skyrocketing costs - Licenses, training, and storage fees accumulate
The Fix
Consolidate where possible. A unified observability approach allows organizations to reduce spending while creating a single source of truth. You don’t need zero tools - you need fewer, better-integrated tools.
Anti-Pattern #7: Set It and Forget It
The monitoring system was perfect - three years ago, when it was set up.
Since then, the application has changed. New services. Different traffic patterns. Shifted priorities. But the alerts and dashboards? Exactly the same.
Configuration Drift
What made sense in 2022 may be wrong in 2025:
- Thresholds based on old traffic patterns
- Alerts for services that no longer exist
- Missing coverage for new critical paths
- Dashboards showing deprecated metrics
Nobody reviews the monitoring configuration. It just… runs. Until it misses something critical.
The Fix
Schedule regular monitoring reviews:
- Monthly: Review alert volumes and false positive rates
- Quarterly: Audit dashboards for relevance
- After major changes: Update monitoring when architecture changes
- After incidents: Add monitoring to catch similar issues
Monitoring is not a one-time setup - it’s ongoing maintenance.
Anti-Pattern #8: Copy-Paste Dashboards
The team adopted a “best practices” dashboard template from a blog post. It shows all the metrics that post recommended.
Nobody knows what half of them mean.
The Template Trap
Out-of-the-box templates are a “one-size-fits-all” method. By simply relying on copy-and-paste dashboards, you aren’t exploring what the metrics mean for your business.
These templates should be a starting point rather than an ending point to understanding your data. Generic dashboards don’t reflect your specific architecture, traffic patterns, or business priorities.
The Fix
Start with templates, then customize:
- Remove metrics you don’t understand or use
- Add metrics specific to your application
- Organize by user journey, not system component
- Ensure every panel answers a specific question
If you can’t explain why a metric is on your dashboard, remove it.
Anti-Pattern #9: Collecting Everything “Just in Case”
Storage is cheap, right? Better to have data and not need it than need it and not have it.
So you log everything. Trace everything. Metric everything.
The Data Hoarding Problem
Organizations have realized that nearly 70% of collected observability data is unnecessary. This leads to:
- Inflated costs - Storage and processing add up
- Slower queries - More data means slower dashboards
- Harder debugging - Signal buried in noise
- Compliance risk - Storing data you don’t need
91% of respondents are employing methods to reduce observability spend, including trying to collect less monitoring data.
The Fix
Be intentional:
- Define retention policies - Not all data needs to live forever
- Sample high-volume data - You don’t need every single trace
- Tier your storage - Hot data for recent, cold for historical
- Review regularly - Delete metrics no one queries
The goal is signal, not volume.
Anti-Pattern #10: Observability Without Action
The most sophisticated monitoring in the world is worthless if nobody acts on it.
This happens when:
- Alerts go to an email inbox nobody checks
- Dashboards exist but aren’t part of daily workflows
- Data is collected but never analyzed
- Incidents happen and monitoring data isn’t consulted
The Visibility Illusion
Only 10% of organizations are actually practicing full observability of their applications and infrastructure. Many have tools; few have practices.
Monitoring is not a checkbox. It’s a capability that requires:
- Clear ownership of alert response
- Documented runbooks for common issues
- Regular review of monitoring data
- Postmortems that improve monitoring coverage
The Fix
Connect monitoring to action:
- Route alerts to the right people - Not generic channels
- Create runbooks - What to do when alerts fire
- Review during incidents - Consult dashboards actively
- Improve after incidents - Add monitoring to catch similar issues
The Anti-Pattern Audit
Use this checklist to evaluate your monitoring practices:
| Anti-Pattern | Warning Signs | Fix |
|---|---|---|
| Internal-only monitoring | Green dashboards during customer complaints | Add external synthetic monitoring |
| Alert fatigue | >100 alerts/day, most ignored | Reduce to actionable alerts only |
| Wrong metrics | System metrics good, users unhappy | Focus on Golden Signals |
| Averages only | P50 looks fine, users complain | Track P95/P99 percentiles |
| Invisible dependencies | Surprised by third-party outages | Map and monitor dependencies |
| Tool sprawl | 6+ monitoring tools | Consolidate platforms |
| Set and forget | Config unchanged for years | Schedule regular reviews |
| Copy-paste dashboards | Panels you don’t understand | Customize for your needs |
| Data hoarding | High costs, slow queries | Collect intentionally |
| No action | Alerts ignored, data unused | Connect to clear processes |
Better Monitoring, Not More Monitoring
The path forward isn’t more tools, more data, or more dashboards. It’s smarter monitoring that:
- Reflects what customers actually experience
- Alerts only when action is required
- Measures what matters to the business
- Gets reviewed and improved over time
The real issue is not a lack of data; it’s a lack of context. Fix the context, and your monitoring becomes a superpower instead of a burden.
FlareWarden is designed around these principles: external monitoring from the customer perspective, configurable alerts that reduce noise, and status pages that turn monitoring data into customer communication.