Cron Monitors
Monitor scheduled jobs, cron tasks, and background workers. FlareWarden detects missed runs, hung jobs, and explicit failures so you know your critical background processes are running on schedule.
Quick Start: Add a Cron Monitor in 60 Seconds
- Open a parent uptime monitor and click Add Cron Monitor
- Set a name, expected schedule, and grace period
- Copy the unique ping URL
- Add a single
curlcall to your cron job
Overview
What cron monitors are and why they matter
Most monitoring tools watch from the outside — they periodically request a URL and check whether it responds. This works well for web services, but it can't observe jobs that run internally on your infrastructure: database backups, report generators, queue workers, data sync tasks, and the dozens of other scheduled processes that keep a production system healthy.
When these jobs fail silently, the consequences range from stale data to data loss to cascading service degradation — and you may not find out until a customer reports a problem days later. Cron monitors solve this blind spot.
A cron monitor in FlareWarden is a lightweight heartbeat check for a single scheduled job. Your job sends a small HTTP ping to a unique URL when it starts and when it completes (or fails). FlareWarden tracks these signals and alerts you when:
- A job doesn't run at all (missed run)
- A job starts but never finishes (hung job)
- A job explicitly reports an error (explicit failure)
Because cron monitors are nested under your existing uptime monitors, all your background job health surfaces in the same dashboard and status page as your web services — giving you one unified view of your infrastructure.
How Cron Monitoring Works
Push-based model and the expect-then-verify loop
Push-Based Model
Unlike uptime monitors that poll your services, cron monitors use a push model. FlareWarden doesn't reach out to your servers — your jobs reach out to FlareWarden. This means:
No firewall changes required
Outbound HTTPS from your servers is all that's needed. FlareWarden never needs inbound access to your network.
Works anywhere jobs run
Linux crontab, Kubernetes CronJobs, GitHub Actions scheduled workflows, serverless functions — if it can make an outbound HTTP request, it can send a ping.
No agent to install
A single curl command
is the entire integration. No daemon, no SDK dependency, no
credentials file.
UUID as bearer token
Each cron monitor gets a unique UUID baked into its ping URL. The URL itself is the credential — no API keys or authentication headers needed.
The Expect-then-Verify Loop
When you create a cron monitor you tell FlareWarden two things: how often your job runs and how long to wait before marking it late. FlareWarden uses these to calculate a next expected time for each run. If no successful ping arrives by that time plus the grace period, the monitor transitions to Late and an alert fires.
Every time a successful complete ping arrives, FlareWarden advances next expected time to the next scheduled occurrence, keeping the monitor perpetually tracking the future.
This loop — set expectation, wait, verify, advance — is what lets FlareWarden detect missed runs with no external polling.
Heartbeat Monitoring
Simplified monitoring for always-running processes
What Is Heartbeat Monitoring?
Heartbeat monitoring is a simplified monitoring mode designed for always-running processes — background workers, queue consumers, daemons, and long-running services. Instead of the full start/complete/fail lifecycle used by cron jobs, your process simply pings a single URL at regular intervals. If FlareWarden stops receiving pings, it knows the process has stalled or crashed and fires an alert.
Cron Monitoring vs. Heartbeat Monitoring
Both modes use the same push-based model, but they're designed for different workloads:
Cron Monitoring
For scheduled jobs with a defined lifecycle: start,
complete, or fail. Uses cron expressions
(*/15 * * * *)
or fixed intervals to define when runs are expected. Detects missed
runs, hung jobs, and explicit failures.
Heartbeat Monitoring
For continuously-running processes that periodically check in. No start/complete/fail signals needed — just a single ping at a regular interval. If the ping stops arriving, the monitor transitions to Late and alerts fire.
Setup
- Create a cron monitor and select Heartbeat as the schedule type.
- Choose a check-in interval — how often your process should ping (e.g., every 60 seconds).
- Set a grace period — this should be longer than your typical deploy/restart time to avoid false alerts during deployments.
- Integrate with a single
curlcall:curl -fsS -m 10 https://app.flarewarden.com/ping/YOUR-TOKEN
Grace Period Guidance
Tip: When setting the grace period for heartbeat monitors, account for the time your service takes to restart during deploys. For example, if your deployments typically take 3 minutes, set the grace period to at least 5 minutes. This prevents false alerts during routine deployments.
Integration Examples
Each example sends a heartbeat ping inside a loop. The ping URL is the only integration point — add it alongside your existing process loop and you're done.
#!/bin/bash
PING_URL="https://app.flarewarden.com/ping/YOUR-TOKEN"
while true; do
# Your process work here
do_work
# Send heartbeat
curl -fsS -m 10 "$PING_URL" > /dev/null
sleep 60
doneSignal Lifecycle & State Machine
Five states, three signals, deterministic transitions
A cron monitor always exists in one of five states. Transitions are deterministic: every state change is triggered by a specific event (a ping arriving, or time passing). Understanding the state machine helps you configure the right grace period and interpret what you see in the dashboard.
Start Signal
Ping /start
when your job begins to track execution duration and detect hung processes.
Optional, but recommended for long-running jobs.
Complete Signal
Ping the base URL or /complete
when the job finishes successfully. This is the primary
health signal and advances the next expected run time.
Fail Signal
Ping /fail
to explicitly report a failure with an optional error message
body (up to 10 KB). Triggers an immediate alert.
The Five States
| Status | Meaning | Triggered By | Alerts? |
|---|---|---|---|
| Pending | Newly created; awaiting first ping | Monitor creation (initial state) | No |
| OK | Job completed successfully and on time | /complete ping received within the expected window | Recovery alert if previously degraded |
| Running | Job has started but not yet completed | /start ping received | No (unless max run duration exceeded) |
| Late | Expected ping not received within the grace period | No ping received after schedule + grace period elapses | Yes — "missed run" alert |
| Failed | Job explicitly failed, or ran too long and hung | /fail ping received,
or max run duration exceeded | Yes — "failure" or "hung job" alert |
State Transitions in Detail
All transitions follow these deterministic rules:
Pending → OK
The very first /complete
ping received after monitor creation moves the monitor from
Pending to OK and sets the
first next expected time. No alert fires; this is simply
the monitor coming online.
OK → Running
A /start
ping moves the monitor to Running. FlareWarden
records the start timestamp and begins measuring execution duration.
If a max run duration is configured, the clock starts ticking.
Running → OK
A /complete
ping closes the run, records the execution duration, and advances
next expected time. The monitor returns to OK.
If an incident was open, it is resolved and a recovery alert fires.
OK → Late (missed run)
If no ping arrives by
next_expected_at + grace_period,
FlareWarden's background checker (which runs every 30 seconds)
transitions the monitor to Late, opens an incident,
and fires a "missed run" alert. The monitor stays Late until it
receives a complete or start ping.
Running → Failed (hung job)
If a max run duration is configured and the job has been in Running state longer than that limit, FlareWarden marks it Failed and opens a "hung job" incident. This catches processes that started but got stuck in an infinite loop or waiting for a resource that never responds.
Failed / Late → OK (recovery)
From any unhealthy state, a
/complete
ping returns the monitor to OK. The open incident
is resolved, next expected time is recalculated, and a
recovery alert notifies your team that the job is healthy again.
Tip: Consecutive failures and missed runs are tracked
separately (consecutive_failures
and consecutive_misses).
These counters help you distinguish a one-off hiccup from a recurring
problem and are visible on the monitor's detail page.
Nesting Under Parent Uptime Monitors
How cron monitors relate to parent uptime monitors
Every cron monitor belongs to a parent uptime monitor. This design reflects how infrastructure actually works: a website usually has a collection of background jobs that support it — cache warmers, email queues, nightly report generators — and all of these are conceptually part of the same service.
Why Nesting?
Unified dashboard view
Your uptime monitor's detail page shows all its cron monitors in one place. You don't need to navigate to a separate section to see background job health.
Status page coherence
When a cron monitor is unhealthy, the parent service on your public status page can reflect that degraded state automatically — depending on the cron monitor's configured severity.
Shared alert routing
Alerts fire through the same channels (email, webhook) as the parent uptime monitor, so your team receives cron failures in the same place as all other FlareWarden alerts.
Shared monitor pool
Cron monitors count against the same unified monitor limit as uptime, content, and dependency monitors — no separate quota to manage.
Severity Levels & Parent Status Roll-up
Each cron monitor has a severity setting that controls how its failures affect the parent uptime monitor's status and the urgency of alerts:
| Severity | Effect on Parent Monitor | Use When |
|---|---|---|
| Critical | Parent marked Down; status page shows the service as fully offline; high-priority alert dispatched | The job is essential and its failure directly impacts users (e.g. payment processing, user email delivery) |
| Degraded | Parent marked Degraded (partial outage); status page shows the service as impaired; standard alert dispatched | The job's failure degrades service quality but doesn't take the site fully down (e.g. cache warming, report generation) |
| Notify Only | Parent status unchanged; status page unaffected; alert still dispatched internally | The job is internal infrastructure whose failure is invisible to end users (e.g. log rotation, analytics aggregation) |
When multiple cron monitors are failing simultaneously, the parent monitor reflects the worst severity among all open incidents. For example, if one Degraded and one Critical cron monitor are both failing, the parent shows as Down. Once the critical incident resolves, the parent automatically recalculates and returns to Degraded state.
Default severity: New cron monitors default to Degraded. Change the severity from the cron monitor's settings at any time without losing historical data.
Ping API Reference
Endpoints, request format, response codes, and rate limits
Each cron monitor gets a unique UUID-based ping URL. No API keys or
authentication headers are required — the UUID itself acts as
the token. Pings can be as simple as a single
curl
call with no body or headers.
Base URL:
https://app.flarewarden.com/ping/{uuid}
Replace {uuid}
with your cron monitor's unique ping token, available on the monitor's detail page
and in the ping_url
field of management API responses.
Endpoint Overview
| Method(s) | Path | Signal | Use When |
|---|---|---|---|
GET
POST
HEAD | /ping/{uuid} | Complete | Heartbeat — job finished successfully. Most common for quick jobs with no start signal. |
GET
POST
HEAD | /ping/{uuid}/start | Start | Job began execution. Enables duration tracking and hung-job detection (optional). |
GET
POST
HEAD | /ping/{uuid}/complete | Complete | Explicit success signal. Functionally identical to the base URL; use for readability in scripts that also send /start. |
GET
POST
HEAD | /ping/{uuid}/fail | Fail | Job failed explicitly. Triggers an immediate alert. Accepts an optional error message body. |
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| uuid | UUID v4 string | Yes | The unique ping token for this cron monitor. Format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
(36 hex characters with hyphens, case-insensitive).
Found on the monitor's detail page and in the
ping_url
field of management API responses. |
Request Body (Optional)
All body fields are optional — a ping with no body at all is perfectly valid.
The body can be submitted as JSON
(Content-Type: application/json),
URL-encoded form data
(Content-Type: application/x-www-form-urlencoded),
or as query parameters appended to the URL. JSON is recommended when sending
structured fields like metadata.
| Field | Type | Constraints | Description |
|---|---|---|---|
| message | string | Max 1,000 characters | Human-readable status message. For fail pings, describe the error. For complete pings, summarize what was processed. Stored in the run history and included in alert notifications. |
| error_message | string | Max 1,000 characters | Legacy alias for message
(kept for backward compatibility). If both fields are present,
message
takes precedence. |
| exit_code | integer | −128 to 255 | The process exit code. Stored in the run history for debugging. Standard Unix convention: 0 = success, non-zero = failure. |
| duration_ms | integer | 0 to 2,592,000,000 (30 days in ms) | Wall-clock execution time in milliseconds. If omitted on a
/complete
ping that follows a
/start
ping, FlareWarden calculates the duration automatically from the
recorded start timestamp. |
| metadata | object (JSON only) | Max 5 keys; key: max 100 chars; value: max 200 chars | Arbitrary key-value pairs stored with the ping result and visible in the run history. Useful for tagging runs with deployment version, environment, record count, or other job-specific context. Available via JSON body only (not as query parameters or form fields). |
Body size limit: The total request body must not exceed 10 KB. Bodies exceeding this limit are rejected with HTTP 400. JSON bodies are strictly validated — unknown fields cause a 400 error.
Response Codes
| Code | Meaning | When It Occurs |
|---|---|---|
200 OK | Ping accepted | The signal was received and processed. The JSON body contains
status: "ok"
and a ping_id
uniquely identifying the recorded ping result. |
202 Accepted | Regional replay | The request arrived at a Fly.io region that doesn't own this
monitor's data. The
fly-replay
response header is set and Fly's infrastructure re-routes the request
automatically. You will never see this code directly — the
replay is transparent to the caller. |
400 Bad Request | Invalid request | The UUID is malformed, the body is not valid JSON, the body exceeds 10 KB, a field value violates a constraint (message too long, exit_code out of range, unknown JSON field, etc.), or there is unexpected data after the JSON value. See the validation error format below for structured error details. |
404 Not Found | Unknown token | No active cron monitor exists for this UUID. The monitor may have been deleted, or the UUID may be truncated or incorrect. If you recently deleted and re-created the monitor, use the new ping URL. |
429 Too Many Requests | Rate limited | More than 60 pings per minute from the same source IP. Check the
Retry-After
response header for how many seconds to wait before retrying.
Normal cron jobs are never rate-limited; see the
rate limiting
section below. |
500 Internal Server Error | Server error | FlareWarden encountered an unexpected error. The ping may not have
been recorded. Retry with backoff using
--retry 3.
If the problem persists, check
status.flarewarden.com. |
Response Format
All ping endpoints return Content-Type: application/json.
Every response includes a status field
of either "ok"
or "error".
{
"status": "ok",
"message": "complete signal received",
"ping_id": "01j9abc123def456"
}ping_id
uniquely identifies the recorded ping result row. Use it to correlate
dashboard events with your job logs for debugging.
{
"status": "error",
"message": "not found"
}Simple error for most failures. The
message
field is human-readable and safe to log.
{
"status": "ok",
"message": "monitor is paused, ping recorded but not processed"
}A paused monitor still returns 200 so your job continues unmodified. The state machine is not updated and no alerts fire while paused.
{
"status": "error",
"message": "message: must be at most 1000 characters",
"errors": [
{
"field": "message",
"message": "must be at most 1000 characters"
}
]
}Validation failures include a structured
errors
array with one entry per invalid field. The top-level
message
summarises the first error for simple clients.
Rate Limiting
Ping endpoints are protected by a per-IP token-bucket rate limiter to prevent abuse and ensure fair service for all customers.
| Limit | Window | Scope | Action When Exceeded |
|---|---|---|---|
| 60 requests | 1 minute | Per source IP address | HTTP 429 with
Retry-After
header indicating seconds until the window resets |
A typical cron job running every minute sends at most 2 pings per cycle (start + complete) — 120 pings/hour from a single IP, well within the limit. The rate limiter only activates for pathological traffic patterns such as tight retry loops or large numbers of jobs sharing a single NAT egress IP that collectively exceed 60 pings/min.
Endpoint Details & Examples
/ping/{uuid}
— Heartbeat / successThe simplest way to report a successful run. Send at the end of your job.
Equivalent to /ping/{uuid}/complete.
Resolves any open incident and advances the next expected run time.
Minimal — GET with no body
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUIDWith structured JSON body
curl -fsS --retry 3 -X POST \
-H "Content-Type: application/json" \
-d '{"message":"Processed 1,423 records","duration_ms":4200,"metadata":{"env":"prod","version":"2.1.0"}}' \
https://app.flarewarden.com/ping/YOUR-UUIDWith query parameters (zero-dependency)
curl -fsS --retry 3 "https://app.flarewarden.com/ping/YOUR-UUID?message=done&duration_ms=4200"/ping/{uuid}/start
— Signal job start (optional)Send at the beginning of your job. Transitions the monitor to
Running, records the start timestamp, and starts the
hung-job clock (if a max run duration is configured). If no
/complete
or /fail
ping follows within the max run duration, the monitor is marked
Failed automatically.
Minimal — GET with no body
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/start/ping/{uuid}/complete
— Explicit success signalFunctionally identical to the base URL ping. Prefer this form when your
script also calls /start
so the symmetry is clear. If a
/start
was sent and duration_ms
is omitted, FlareWarden auto-computes the duration from the start timestamp.
With exit code and duration
curl -fsS --retry 3 -X POST \
-H "Content-Type: application/json" \
-d '{"exit_code":0,"duration_ms":12500,"message":"Synced 3 tables"}' \
https://app.flarewarden.com/ping/YOUR-UUID/complete/ping/{uuid}/fail
— Report explicit failureMarks the job as Failed and triggers an immediate alert via all configured channels. The optional error message is stored in the run history and included in alert notifications. Use this in the error-handling path of your job so FlareWarden receives an explicit failure signal rather than waiting for the grace period to expire.
Simple failure — no body
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/failFailure with structured JSON error details
curl -fsS --retry 3 -X POST \
-H "Content-Type: application/json" \
-d '{"message":"Database connection refused after 3 retries","exit_code":1,"metadata":{"host":"db-primary","attempt":"3"}}' \
https://app.flarewarden.com/ping/YOUR-UUID/failFailure via query parameters (no JSON needed)
curl -fsS --retry 3 \
"https://app.flarewarden.com/ping/YOUR-UUID/fail?message=disk+full&exit_code=1"Failure via URL-encoded form data
curl -fsS --retry 3 -X POST \
--data-urlencode "message=backup failed: no space left on device" \
--data-urlencode "exit_code=28" \
https://app.flarewarden.com/ping/YOUR-UUID/failConfiguration
Schedule, grace period, and max run duration settings
Schedule
Define how often your job is expected to run. FlareWarden supports two schedule formats:
Cron Expression
Standard 5-field cron syntax for precise schedules.
*/15 * * * *Every 15 minutes
Simple Interval
Human-readable intervals for common schedules.
every 1 hourRuns once per hour
Grace Period
A buffer window after the expected run time before the monitor is marked as Late. This accounts for natural timing variance in cron execution.
Default: 5 minutes. Can be set from 0 to 24 hours. A grace period of 0 means the monitor transitions to Late immediately when the expected time passes without a ping.
Max Run Duration
An optional upper bound on how long a job should run.
If a job sends a /start
ping and stays in Running state longer than
this limit, FlareWarden marks it Failed and
opens a "hung job" incident.
Default: Disabled (0). Enable only for jobs where an unusually long run is itself a signal of failure.
Parent Monitor
Cron monitors are nested under a parent uptime monitor. This groups related monitors together — for example, a website's uptime check and its background job monitors appear together in the dashboard and on status pages. See the Nesting Under Parent Uptime Monitors section above for details on how severity levels affect the parent's status.
Integration Examples
Code examples in 8 languages with full lifecycle patterns
Adding cron monitoring takes a single HTTP request. Pick your language and copy the full-lifecycle pattern — start, run your job, then report success or failure.
# Simple heartbeat — fire and forget after your job completes
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID
# Full lifecycle with start/complete/fail signals
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/start
# ... your job runs here ...
# On success — report completion with structured metadata
curl -fsS --retry 3 -X POST \
-H "Content-Type: application/json" \
-d '{"message":"Processed 1,423 records","duration_ms":4200,"metadata":{"env":"prod","version":"2.1.0"}}' \
https://app.flarewarden.com/ping/YOUR-UUID/complete
# On failure — report error details
curl -fsS --retry 3 -X POST \
-H "Content-Type: application/json" \
-d '{"message":"Connection refused","exit_code":1}' \
https://app.flarewarden.com/ping/YOUR-UUID/failPlatform Integration
Crontab, shell scripts, Kubernetes, and Docker
Examples for common schedulers and orchestration platforms.
Wrap your existing cron command with start and complete pings:
# Database backup — every day at 2 AM
# Pattern: start → run job → complete on success, fail on error
0 2 * * * /usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/start && \
/usr/local/bin/backup.sh && \
/usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID || \
/usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/fail
# Quick job — heartbeat only, no start signal needed
*/15 * * * * /usr/local/bin/sync-cache.sh && /usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUIDAlerting
How cron monitor alerts work
Cron monitors use the same alerting system as all other FlareWarden monitor types. When a monitor transitions to Late or Failed, alerts are sent through your configured channels:
- Email notifications to team members
- Webhook deliveries with JSON payloads
- Status page updates (if the cron monitor is linked to a service)
Recovery alerts are sent when a previously late or failed monitor receives a successful ping.
Status Page Integration
Displaying cron monitor status on public pages
Cron monitors can be displayed on your public status pages alongside uptime and other monitor types. The cron monitor status is shown as part of its parent uptime monitor's service group, giving your users visibility into background job health.
Tip: When a cron monitor is late or failing, its parent service on the status page will reflect the degraded state, so external users see one unified view of service health.
Billing & Limits
How cron monitors count against your plan
Cron monitors share the same unified monitor pool as uptime, content, and dependency monitors. Each cron monitor counts as one monitor against your plan's limit.
Example: On a plan with 50 monitors, you could have 30 uptime monitors, 10 content monitors, 5 dependency monitors, and 5 cron monitors.
Best Practices
Tips for reliable cron monitoring
Always use --retry 3
Network blips happen. The --retry
flag ensures transient failures don't cause false alerts.
Use the start signal for long-running jobs
The /start
ping lets FlareWarden detect hung jobs that begin but
never complete.
Set an appropriate grace period
A grace period that's too short causes false alerts. A good rule is 2× the typical variance in your job's start time.
Send error details with fail pings
POST a body with your /fail
ping containing the error message or exit code. This
appears in the monitor's event log for quick debugging.
Don't let pings block your job
Use the -fsS
flags with curl to fail silently and keep timeouts short.
The ping should never prevent your actual job from running.
Troubleshooting & FAQ
Common issues and frequently asked questions
Common Issues
404 My ping returns "404 Not Found"
A 404 means the UUID in your URL does not match any active cron monitor. Common causes:
- The monitor was deleted and re-created. Deleting a monitor invalidates its UUID. Copy the new ping URL from the replacement monitor's detail page.
- The UUID is truncated or has extra characters. Verify your URL exactly matches the one shown on the dashboard — it must be 36 characters including hyphens. Shell variable expansion or line wrapping can silently corrupt a UUID.
- Wrong account. If you manage multiple accounts, check that the ping token came from the right one.
To diagnose, run curl in verbose mode and inspect the response body:
curl -v https://app.flarewarden.com/ping/YOUR-UUID400 My ping returns "invalid ping token format"
The UUID in the path must match the pattern
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
(lowercase hexadecimal, exactly 36 characters with hyphens in positions 9, 14, 19, and 24).
Typical causes:
- The URL was copy-pasted with surrounding whitespace or a trailing newline.
- A shell variable wasn't quoted and word-splitting broke the UUID.
- The path contains URL-encoded characters (e.g.
%7Binstead of{) because the UUID placeholder wasn't replaced.
400 My ping body is rejected with a validation error
The response body's errors
array tells you exactly which field failed and why. Common fixes:
messagetoo long. Truncate your error output to under 1,000 characters before sending. In Bash:MSG="${ERR:0:900}"exit_codeout of range. Clamp to the valid range in shell:EXIT=$(( $? & 255 ))duration_msis negative. Ensure you compute end − start (not the reverse) when measuring elapsed time.- Too many metadata keys or values too long. Maximum 5 keys; each key up to 100 characters, each value up to 200 characters.
- Unknown fields in the JSON body. The JSON decoder is strict. Remove any fields not listed in the request body schema.
- Body exceeds 10 KB. Don't send full stack traces or log output. Summarise the error in a short message and include the job ID in metadata for cross-referencing.
429 My ping returns "429 Too Many Requests"
You're sending more than 60 requests per minute from the same IP.
Check the Retry-After
response header for how many seconds to wait before retrying.
Common causes of unexpected rate-limit hits:
- A retry loop is running with no backoff — pings fire as fast as the network allows. Add exponential backoff or use
--retry-delay 5with curl. - Many servers or containers pinging through the same NAT/egress IP. Consider consolidating pings or using a different egress path.
- A bug causes the ping call to execute many times per run (e.g. inside a loop).
My monitor shows "Late" even though the job ran successfully
This means the /complete
ping arrived after the grace period elapsed.
FlareWarden checks for overdue monitors every 30 seconds and transitions
to Late at next_expected_at + grace_period.
- Grace period is too short. If your job's start time drifts (e.g. cron load on a busy host), increase the grace period to at least 2× the typical variance. For hourly jobs, 5–10 minutes is a good starting point.
- Ping is sent at the end of a long job. Add a
/startping at the beginning so FlareWarden sees the job is running, then send/completeat the end. - Server clock skew. FlareWarden uses UTC. Verify your server clock is synchronised (
ntpdate -q pool.ntp.org). - Schedule is misconfigured. If the stored interval doesn't match how often the job actually runs, the next expected time is calculated incorrectly. Update the schedule on the monitor's settings page.
A successful /complete ping always returns the monitor to OK and advances the next expected time, regardless of current state.
My monitor is stuck in "Running" state
The monitor received a /start
ping but no /complete
or /fail has arrived. Possible causes:
- The job crashed before reaching the success or failure ping.
- The success ping failed silently because
--retrywasn't used and a transient network error dropped it. - Your script's error handling doesn't cover all exit paths — the fail ping is only called for some errors.
If a Max Run Duration is configured, the monitor transitions to
Failed automatically when it's exceeded. Without that setting, the
monitor stays Running until a signal arrives. Send a
/complete
ping manually to reset it, then fix the script to send signals on all exit paths.
The ping curl call is failing my job's exit code check
Cron pings should never prevent the actual job from running or affect its reported exit code. There are two patterns:
Discard curl's exit status entirely:
curl -fsS --retry 3 "$PING_URL" || trueCapture job exit code before sending the ping:
/usr/local/bin/my-job.sh
JOB_EXIT=$?
if [ "$JOB_EXIT" -eq 0 ]; then
curl -fsS --retry 3 "$PING_URL" || true
else
curl -fsS --retry 3 "$PING_URL/fail?exit_code=$JOB_EXIT" || true
fi
exit $JOB_EXITThe flags -fsS
mean: fail silently on HTTP errors (-f),
suppress the progress meter (-s),
but still show errors (-S).
Frequently Asked Questions
What happens if my ping URL is exposed?
Can I monitor jobs that don't run on a fixed schedule?
Is there a rate limit on pings?
Retry-After
header for when to retry and review the
troubleshooting
section above for common causes.Do cron monitors work with the management API?
ping_url
field in the API response contains the full ready-to-use URL so
you can automate provisioning without ever opening the dashboard.How is the ping routed to the correct region?
fly-replay
response header and Fly's infrastructure re-routes the request to
the correct region transparently. From your job's perspective, you
just send one curl and get back a 200.What is the "Pending" state and when does it end?
/complete
ping before beginning to track the schedule. No alerts fire during
Pending — it simply means the monitor hasn't been established yet.
Once the first complete ping arrives, the monitor moves to
OK and starts the expect-then-verify loop.What does "hung job detection" mean and how do I enable it?
If you configure a Max Run Duration on the monitor and a
job sends a /start
ping but stays in Running state longer than that limit, FlareWarden marks
it Failed and opens a "hung job" incident — even
though no /fail
ping was received. The check runs every 30 seconds.
This catches processes that get stuck waiting on a resource (network
timeout, database lock, external API, etc.) without crashing.
To enable it, open the monitor's settings, set a Max Run Duration
slightly above your job's worst-case expected runtime, and ensure
your job sends a /start
ping at the beginning.
Can I send structured metadata with pings?
Yes. Include a metadata
key in your JSON body with up to 5 string key-value pairs. Metadata is
stored with the ping result and visible in the run history on the
monitor's detail page. Useful for: deployment version, environment name,
record count processed, source S3 bucket, job run ID.
curl -X POST -H "Content-Type: application/json" \
-d '{"metadata":{"version":"1.4.2","env":"production","records":"8431","job_id":"run-abc123"}}' \
https://app.flarewarden.com/ping/YOUR-UUIDNote: metadata
is only available via JSON body; it cannot be sent as a query parameter or form field.
Can I send error details without writing JSON?
Yes. Use query parameters for the simplest possible integration — no JSON serialisation, no Content-Type header needed:
# URL-encode spaces as + and special characters as %XX
curl -fsS "https://app.flarewarden.com/ping/YOUR-UUID/fail?message=backup+failed%3A+disk+full&exit_code=28"Or use URL-encoded form data with curl's --data-urlencode for automatic encoding:
curl -fsS -X POST \
--data-urlencode "message=backup failed: no space left on device" \
--data-urlencode "exit_code=28" \
https://app.flarewarden.com/ping/YOUR-UUID/failThe metadata field
requires JSON and is not available via query parameters or form encoding.
Never miss a failed cron job again
Set up your first cron monitor in under a minute. No agents to install, no complex configuration — just a single curl call.
Start Monitoring Free