Skip to main content

Cron Monitors

Monitor scheduled jobs, cron tasks, and background workers. FlareWarden detects missed runs, hung jobs, and explicit failures so you know your critical background processes are running on schedule.

Quick Start: Add a Cron Monitor in 60 Seconds

  1. Open a parent uptime monitor and click Add Cron Monitor
  2. Set a name, expected schedule, and grace period
  3. Copy the unique ping URL
  4. Add a single curl call to your cron job

Overview

What cron monitors are and why they matter

Most monitoring tools watch from the outside — they periodically request a URL and check whether it responds. This works well for web services, but it can't observe jobs that run internally on your infrastructure: database backups, report generators, queue workers, data sync tasks, and the dozens of other scheduled processes that keep a production system healthy.

When these jobs fail silently, the consequences range from stale data to data loss to cascading service degradation — and you may not find out until a customer reports a problem days later. Cron monitors solve this blind spot.

A cron monitor in FlareWarden is a lightweight heartbeat check for a single scheduled job. Your job sends a small HTTP ping to a unique URL when it starts and when it completes (or fails). FlareWarden tracks these signals and alerts you when:

  • A job doesn't run at all (missed run)
  • A job starts but never finishes (hung job)
  • A job explicitly reports an error (explicit failure)

Because cron monitors are nested under your existing uptime monitors, all your background job health surfaces in the same dashboard and status page as your web services — giving you one unified view of your infrastructure.

How Cron Monitoring Works

Push-based model and the expect-then-verify loop

Push-Based Model

Unlike uptime monitors that poll your services, cron monitors use a push model. FlareWarden doesn't reach out to your servers — your jobs reach out to FlareWarden. This means:

No firewall changes required

Outbound HTTPS from your servers is all that's needed. FlareWarden never needs inbound access to your network.

Works anywhere jobs run

Linux crontab, Kubernetes CronJobs, GitHub Actions scheduled workflows, serverless functions — if it can make an outbound HTTP request, it can send a ping.

No agent to install

A single curl command is the entire integration. No daemon, no SDK dependency, no credentials file.

UUID as bearer token

Each cron monitor gets a unique UUID baked into its ping URL. The URL itself is the credential — no API keys or authentication headers needed.

The Expect-then-Verify Loop

When you create a cron monitor you tell FlareWarden two things: how often your job runs and how long to wait before marking it late. FlareWarden uses these to calculate a next expected time for each run. If no successful ping arrives by that time plus the grace period, the monitor transitions to Late and an alert fires.

Every time a successful complete ping arrives, FlareWarden advances next expected time to the next scheduled occurrence, keeping the monitor perpetually tracking the future.

This loop — set expectation, wait, verify, advance — is what lets FlareWarden detect missed runs with no external polling.

Heartbeat Monitoring

Simplified monitoring for always-running processes

What Is Heartbeat Monitoring?

Heartbeat monitoring is a simplified monitoring mode designed for always-running processes — background workers, queue consumers, daemons, and long-running services. Instead of the full start/complete/fail lifecycle used by cron jobs, your process simply pings a single URL at regular intervals. If FlareWarden stops receiving pings, it knows the process has stalled or crashed and fires an alert.

Cron Monitoring vs. Heartbeat Monitoring

Both modes use the same push-based model, but they're designed for different workloads:

Cron Monitoring

For scheduled jobs with a defined lifecycle: start, complete, or fail. Uses cron expressions (*/15 * * * *) or fixed intervals to define when runs are expected. Detects missed runs, hung jobs, and explicit failures.

Heartbeat Monitoring

For continuously-running processes that periodically check in. No start/complete/fail signals needed — just a single ping at a regular interval. If the ping stops arriving, the monitor transitions to Late and alerts fire.

Setup

  1. Create a cron monitor and select Heartbeat as the schedule type.
  2. Choose a check-in interval — how often your process should ping (e.g., every 60 seconds).
  3. Set a grace period — this should be longer than your typical deploy/restart time to avoid false alerts during deployments.
  4. Integrate with a single curl call:
    curl -fsS -m 10 https://app.flarewarden.com/ping/YOUR-TOKEN

Grace Period Guidance

Tip: When setting the grace period for heartbeat monitors, account for the time your service takes to restart during deploys. For example, if your deployments typically take 3 minutes, set the grace period to at least 5 minutes. This prevents false alerts during routine deployments.

Integration Examples

Each example sends a heartbeat ping inside a loop. The ping URL is the only integration point — add it alongside your existing process loop and you're done.

#!/bin/bash
PING_URL="https://app.flarewarden.com/ping/YOUR-TOKEN"

while true; do
    # Your process work here
    do_work

    # Send heartbeat
    curl -fsS -m 10 "$PING_URL" > /dev/null
    sleep 60
done

Signal Lifecycle & State Machine

Five states, three signals, deterministic transitions

A cron monitor always exists in one of five states. Transitions are deterministic: every state change is triggered by a specific event (a ping arriving, or time passing). Understanding the state machine helps you configure the right grace period and interpret what you see in the dashboard.

Start Signal

Ping /start when your job begins to track execution duration and detect hung processes. Optional, but recommended for long-running jobs.

Complete Signal

Ping the base URL or /complete when the job finishes successfully. This is the primary health signal and advances the next expected run time.

Fail Signal

Ping /fail to explicitly report a failure with an optional error message body (up to 10 KB). Triggers an immediate alert.

The Five States

StatusMeaningTriggered ByAlerts?
PendingNewly created; awaiting first pingMonitor creation (initial state)No
OKJob completed successfully and on time/complete ping received within the expected windowRecovery alert if previously degraded
RunningJob has started but not yet completed/start ping receivedNo (unless max run duration exceeded)
LateExpected ping not received within the grace periodNo ping received after schedule + grace period elapsesYes — "missed run" alert
FailedJob explicitly failed, or ran too long and hung/fail ping received, or max run duration exceededYes — "failure" or "hung job" alert

State Transitions in Detail

All transitions follow these deterministic rules:

Pending → OK

The very first /complete ping received after monitor creation moves the monitor from Pending to OK and sets the first next expected time. No alert fires; this is simply the monitor coming online.

OK → Running

A /start ping moves the monitor to Running. FlareWarden records the start timestamp and begins measuring execution duration. If a max run duration is configured, the clock starts ticking.

Running → OK

A /complete ping closes the run, records the execution duration, and advances next expected time. The monitor returns to OK. If an incident was open, it is resolved and a recovery alert fires.

OK → Late (missed run)

If no ping arrives by next_expected_at + grace_period, FlareWarden's background checker (which runs every 30 seconds) transitions the monitor to Late, opens an incident, and fires a "missed run" alert. The monitor stays Late until it receives a complete or start ping.

Running → Failed (hung job)

If a max run duration is configured and the job has been in Running state longer than that limit, FlareWarden marks it Failed and opens a "hung job" incident. This catches processes that started but got stuck in an infinite loop or waiting for a resource that never responds.

Failed / Late → OK (recovery)

From any unhealthy state, a /complete ping returns the monitor to OK. The open incident is resolved, next expected time is recalculated, and a recovery alert notifies your team that the job is healthy again.

Tip: Consecutive failures and missed runs are tracked separately (consecutive_failures and consecutive_misses). These counters help you distinguish a one-off hiccup from a recurring problem and are visible on the monitor's detail page.

Nesting Under Parent Uptime Monitors

How cron monitors relate to parent uptime monitors

Every cron monitor belongs to a parent uptime monitor. This design reflects how infrastructure actually works: a website usually has a collection of background jobs that support it — cache warmers, email queues, nightly report generators — and all of these are conceptually part of the same service.

Why Nesting?

Unified dashboard view

Your uptime monitor's detail page shows all its cron monitors in one place. You don't need to navigate to a separate section to see background job health.

Status page coherence

When a cron monitor is unhealthy, the parent service on your public status page can reflect that degraded state automatically — depending on the cron monitor's configured severity.

Shared alert routing

Alerts fire through the same channels (email, webhook) as the parent uptime monitor, so your team receives cron failures in the same place as all other FlareWarden alerts.

Shared monitor pool

Cron monitors count against the same unified monitor limit as uptime, content, and dependency monitors — no separate quota to manage.

Severity Levels & Parent Status Roll-up

Each cron monitor has a severity setting that controls how its failures affect the parent uptime monitor's status and the urgency of alerts:

SeverityEffect on Parent MonitorUse When
CriticalParent marked Down; status page shows the service as fully offline; high-priority alert dispatchedThe job is essential and its failure directly impacts users (e.g. payment processing, user email delivery)
DegradedParent marked Degraded (partial outage); status page shows the service as impaired; standard alert dispatchedThe job's failure degrades service quality but doesn't take the site fully down (e.g. cache warming, report generation)
Notify OnlyParent status unchanged; status page unaffected; alert still dispatched internallyThe job is internal infrastructure whose failure is invisible to end users (e.g. log rotation, analytics aggregation)

When multiple cron monitors are failing simultaneously, the parent monitor reflects the worst severity among all open incidents. For example, if one Degraded and one Critical cron monitor are both failing, the parent shows as Down. Once the critical incident resolves, the parent automatically recalculates and returns to Degraded state.

Default severity: New cron monitors default to Degraded. Change the severity from the cron monitor's settings at any time without losing historical data.

Ping API Reference

Endpoints, request format, response codes, and rate limits

Each cron monitor gets a unique UUID-based ping URL. No API keys or authentication headers are required — the UUID itself acts as the token. Pings can be as simple as a single curl call with no body or headers.

Base URL: https://app.flarewarden.com/ping/{uuid}
Replace {uuid} with your cron monitor's unique ping token, available on the monitor's detail page and in the ping_url field of management API responses.

Endpoint Overview

Method(s)PathSignalUse When
GET POST HEAD/ping/{uuid}CompleteHeartbeat — job finished successfully. Most common for quick jobs with no start signal.
GET POST HEAD/ping/{uuid}/startStartJob began execution. Enables duration tracking and hung-job detection (optional).
GET POST HEAD/ping/{uuid}/completeCompleteExplicit success signal. Functionally identical to the base URL; use for readability in scripts that also send /start.
GET POST HEAD/ping/{uuid}/failFailJob failed explicitly. Triggers an immediate alert. Accepts an optional error message body.

Path Parameters

ParameterTypeRequiredDescription
uuidUUID v4 stringYesThe unique ping token for this cron monitor. Format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (36 hex characters with hyphens, case-insensitive). Found on the monitor's detail page and in the ping_url field of management API responses.

Request Body (Optional)

All body fields are optional — a ping with no body at all is perfectly valid. The body can be submitted as JSON (Content-Type: application/json), URL-encoded form data (Content-Type: application/x-www-form-urlencoded), or as query parameters appended to the URL. JSON is recommended when sending structured fields like metadata.

FieldTypeConstraintsDescription
messagestringMax 1,000 charactersHuman-readable status message. For fail pings, describe the error. For complete pings, summarize what was processed. Stored in the run history and included in alert notifications.
error_messagestringMax 1,000 charactersLegacy alias for message (kept for backward compatibility). If both fields are present, message takes precedence.
exit_codeinteger−128 to 255The process exit code. Stored in the run history for debugging. Standard Unix convention: 0 = success, non-zero = failure.
duration_msinteger0 to 2,592,000,000
(30 days in ms)
Wall-clock execution time in milliseconds. If omitted on a /complete ping that follows a /start ping, FlareWarden calculates the duration automatically from the recorded start timestamp.
metadataobject
(JSON only)
Max 5 keys;
key: max 100 chars;
value: max 200 chars
Arbitrary key-value pairs stored with the ping result and visible in the run history. Useful for tagging runs with deployment version, environment, record count, or other job-specific context. Available via JSON body only (not as query parameters or form fields).

Body size limit: The total request body must not exceed 10 KB. Bodies exceeding this limit are rejected with HTTP 400. JSON bodies are strictly validated — unknown fields cause a 400 error.

Response Codes

CodeMeaningWhen It Occurs
200 OKPing acceptedThe signal was received and processed. The JSON body contains status: "ok" and a ping_id uniquely identifying the recorded ping result.
202 AcceptedRegional replayThe request arrived at a Fly.io region that doesn't own this monitor's data. The fly-replay response header is set and Fly's infrastructure re-routes the request automatically. You will never see this code directly — the replay is transparent to the caller.
400 Bad RequestInvalid requestThe UUID is malformed, the body is not valid JSON, the body exceeds 10 KB, a field value violates a constraint (message too long, exit_code out of range, unknown JSON field, etc.), or there is unexpected data after the JSON value. See the validation error format below for structured error details.
404 Not FoundUnknown tokenNo active cron monitor exists for this UUID. The monitor may have been deleted, or the UUID may be truncated or incorrect. If you recently deleted and re-created the monitor, use the new ping URL.
429 Too Many RequestsRate limitedMore than 60 pings per minute from the same source IP. Check the Retry-After response header for how many seconds to wait before retrying. Normal cron jobs are never rate-limited; see the rate limiting section below.
500 Internal Server ErrorServer errorFlareWarden encountered an unexpected error. The ping may not have been recorded. Retry with backoff using --retry 3. If the problem persists, check status.flarewarden.com.

Response Format

All ping endpoints return Content-Type: application/json. Every response includes a status field of either "ok" or "error".

Success — HTTP 200
{
  "status": "ok",
  "message": "complete signal received",
  "ping_id": "01j9abc123def456"
}

ping_id uniquely identifies the recorded ping result row. Use it to correlate dashboard events with your job logs for debugging.

Error — HTTP 4xx / 5xx
{
  "status": "error",
  "message": "not found"
}

Simple error for most failures. The message field is human-readable and safe to log.

Paused Monitor — HTTP 200
{
  "status": "ok",
  "message": "monitor is paused, ping recorded but not processed"
}

A paused monitor still returns 200 so your job continues unmodified. The state machine is not updated and no alerts fire while paused.

Validation Error — HTTP 400
{
  "status": "error",
  "message": "message: must be at most 1000 characters",
  "errors": [
    {
      "field": "message",
      "message": "must be at most 1000 characters"
    }
  ]
}

Validation failures include a structured errors array with one entry per invalid field. The top-level message summarises the first error for simple clients.

Rate Limiting

Ping endpoints are protected by a per-IP token-bucket rate limiter to prevent abuse and ensure fair service for all customers.

LimitWindowScopeAction When Exceeded
60 requests1 minutePer source IP addressHTTP 429 with Retry-After header indicating seconds until the window resets

A typical cron job running every minute sends at most 2 pings per cycle (start + complete) — 120 pings/hour from a single IP, well within the limit. The rate limiter only activates for pathological traffic patterns such as tight retry loops or large numbers of jobs sharing a single NAT egress IP that collectively exceed 60 pings/min.

Endpoint Details & Examples

GET POST HEAD /ping/{uuid} — Heartbeat / success

The simplest way to report a successful run. Send at the end of your job. Equivalent to /ping/{uuid}/complete. Resolves any open incident and advances the next expected run time.

Minimal — GET with no body

curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID

With structured JSON body

curl -fsS --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d '{"message":"Processed 1,423 records","duration_ms":4200,"metadata":{"env":"prod","version":"2.1.0"}}' \
  https://app.flarewarden.com/ping/YOUR-UUID

With query parameters (zero-dependency)

curl -fsS --retry 3 "https://app.flarewarden.com/ping/YOUR-UUID?message=done&duration_ms=4200"
GET POST /ping/{uuid}/start — Signal job start (optional)

Send at the beginning of your job. Transitions the monitor to Running, records the start timestamp, and starts the hung-job clock (if a max run duration is configured). If no /complete or /fail ping follows within the max run duration, the monitor is marked Failed automatically.

Minimal — GET with no body

curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/start
GET POST /ping/{uuid}/complete — Explicit success signal

Functionally identical to the base URL ping. Prefer this form when your script also calls /start so the symmetry is clear. If a /start was sent and duration_ms is omitted, FlareWarden auto-computes the duration from the start timestamp.

With exit code and duration

curl -fsS --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d '{"exit_code":0,"duration_ms":12500,"message":"Synced 3 tables"}' \
  https://app.flarewarden.com/ping/YOUR-UUID/complete
GET POST /ping/{uuid}/fail — Report explicit failure

Marks the job as Failed and triggers an immediate alert via all configured channels. The optional error message is stored in the run history and included in alert notifications. Use this in the error-handling path of your job so FlareWarden receives an explicit failure signal rather than waiting for the grace period to expire.

Simple failure — no body

curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/fail

Failure with structured JSON error details

curl -fsS --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d '{"message":"Database connection refused after 3 retries","exit_code":1,"metadata":{"host":"db-primary","attempt":"3"}}' \
  https://app.flarewarden.com/ping/YOUR-UUID/fail

Failure via query parameters (no JSON needed)

curl -fsS --retry 3 \
  "https://app.flarewarden.com/ping/YOUR-UUID/fail?message=disk+full&exit_code=1"

Failure via URL-encoded form data

curl -fsS --retry 3 -X POST \
  --data-urlencode "message=backup failed: no space left on device" \
  --data-urlencode "exit_code=28" \
  https://app.flarewarden.com/ping/YOUR-UUID/fail

Configuration

Schedule, grace period, and max run duration settings

Schedule

Define how often your job is expected to run. FlareWarden supports two schedule formats:

Cron Expression

Standard 5-field cron syntax for precise schedules.

*/15 * * * *

Every 15 minutes

Simple Interval

Human-readable intervals for common schedules.

every 1 hour

Runs once per hour

Grace Period

A buffer window after the expected run time before the monitor is marked as Late. This accounts for natural timing variance in cron execution.

Default: 5 minutes. Can be set from 0 to 24 hours. A grace period of 0 means the monitor transitions to Late immediately when the expected time passes without a ping.

Max Run Duration

An optional upper bound on how long a job should run. If a job sends a /start ping and stays in Running state longer than this limit, FlareWarden marks it Failed and opens a "hung job" incident.

Default: Disabled (0). Enable only for jobs where an unusually long run is itself a signal of failure.

Parent Monitor

Cron monitors are nested under a parent uptime monitor. This groups related monitors together — for example, a website's uptime check and its background job monitors appear together in the dashboard and on status pages. See the Nesting Under Parent Uptime Monitors section above for details on how severity levels affect the parent's status.

Integration Examples

Code examples in 8 languages with full lifecycle patterns

Adding cron monitoring takes a single HTTP request. Pick your language and copy the full-lifecycle pattern — start, run your job, then report success or failure.

# Simple heartbeat — fire and forget after your job completes
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID

# Full lifecycle with start/complete/fail signals
curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/start

# ... your job runs here ...

# On success — report completion with structured metadata
curl -fsS --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d '{"message":"Processed 1,423 records","duration_ms":4200,"metadata":{"env":"prod","version":"2.1.0"}}' \
  https://app.flarewarden.com/ping/YOUR-UUID/complete

# On failure — report error details
curl -fsS --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d '{"message":"Connection refused","exit_code":1}' \
  https://app.flarewarden.com/ping/YOUR-UUID/fail

Platform Integration

Crontab, shell scripts, Kubernetes, and Docker

Examples for common schedulers and orchestration platforms.

Wrap your existing cron command with start and complete pings:

# Database backup — every day at 2 AM
# Pattern: start → run job → complete on success, fail on error
0 2 * * * /usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/start && \
  /usr/local/bin/backup.sh && \
  /usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID || \
  /usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID/fail

# Quick job — heartbeat only, no start signal needed
*/15 * * * * /usr/local/bin/sync-cache.sh && /usr/bin/curl -fsS --retry 3 https://app.flarewarden.com/ping/YOUR-UUID

Alerting

How cron monitor alerts work

Cron monitors use the same alerting system as all other FlareWarden monitor types. When a monitor transitions to Late or Failed, alerts are sent through your configured channels:

  • Email notifications to team members
  • Webhook deliveries with JSON payloads
  • Status page updates (if the cron monitor is linked to a service)

Recovery alerts are sent when a previously late or failed monitor receives a successful ping.

Status Page Integration

Displaying cron monitor status on public pages

Cron monitors can be displayed on your public status pages alongside uptime and other monitor types. The cron monitor status is shown as part of its parent uptime monitor's service group, giving your users visibility into background job health.

Tip: When a cron monitor is late or failing, its parent service on the status page will reflect the degraded state, so external users see one unified view of service health.

Billing & Limits

How cron monitors count against your plan

Cron monitors share the same unified monitor pool as uptime, content, and dependency monitors. Each cron monitor counts as one monitor against your plan's limit.

Example: On a plan with 50 monitors, you could have 30 uptime monitors, 10 content monitors, 5 dependency monitors, and 5 cron monitors.

Best Practices

Tips for reliable cron monitoring

1

Always use --retry 3

Network blips happen. The --retry flag ensures transient failures don't cause false alerts.

2

Use the start signal for long-running jobs

The /start ping lets FlareWarden detect hung jobs that begin but never complete.

3

Set an appropriate grace period

A grace period that's too short causes false alerts. A good rule is 2× the typical variance in your job's start time.

4

Send error details with fail pings

POST a body with your /fail ping containing the error message or exit code. This appears in the monitor's event log for quick debugging.

5

Don't let pings block your job

Use the -fsS flags with curl to fail silently and keep timeouts short. The ping should never prevent your actual job from running.

Troubleshooting & FAQ

Common issues and frequently asked questions

Common Issues

404 My ping returns "404 Not Found"

A 404 means the UUID in your URL does not match any active cron monitor. Common causes:

  • The monitor was deleted and re-created. Deleting a monitor invalidates its UUID. Copy the new ping URL from the replacement monitor's detail page.
  • The UUID is truncated or has extra characters. Verify your URL exactly matches the one shown on the dashboard — it must be 36 characters including hyphens. Shell variable expansion or line wrapping can silently corrupt a UUID.
  • Wrong account. If you manage multiple accounts, check that the ping token came from the right one.

To diagnose, run curl in verbose mode and inspect the response body:

curl -v https://app.flarewarden.com/ping/YOUR-UUID
400 My ping returns "invalid ping token format"

The UUID in the path must match the pattern xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (lowercase hexadecimal, exactly 36 characters with hyphens in positions 9, 14, 19, and 24).

Typical causes:

  • The URL was copy-pasted with surrounding whitespace or a trailing newline.
  • A shell variable wasn't quoted and word-splitting broke the UUID.
  • The path contains URL-encoded characters (e.g. %7B instead of {) because the UUID placeholder wasn't replaced.
400 My ping body is rejected with a validation error

The response body's errors array tells you exactly which field failed and why. Common fixes:

  • message too long. Truncate your error output to under 1,000 characters before sending. In Bash: MSG="${ERR:0:900}"
  • exit_code out of range. Clamp to the valid range in shell: EXIT=$(( $? & 255 ))
  • duration_ms is negative. Ensure you compute end − start (not the reverse) when measuring elapsed time.
  • Too many metadata keys or values too long. Maximum 5 keys; each key up to 100 characters, each value up to 200 characters.
  • Unknown fields in the JSON body. The JSON decoder is strict. Remove any fields not listed in the request body schema.
  • Body exceeds 10 KB. Don't send full stack traces or log output. Summarise the error in a short message and include the job ID in metadata for cross-referencing.
429 My ping returns "429 Too Many Requests"

You're sending more than 60 requests per minute from the same IP. Check the Retry-After response header for how many seconds to wait before retrying.

Common causes of unexpected rate-limit hits:

  • A retry loop is running with no backoff — pings fire as fast as the network allows. Add exponential backoff or use --retry-delay 5 with curl.
  • Many servers or containers pinging through the same NAT/egress IP. Consider consolidating pings or using a different egress path.
  • A bug causes the ping call to execute many times per run (e.g. inside a loop).
My monitor shows "Late" even though the job ran successfully

This means the /complete ping arrived after the grace period elapsed. FlareWarden checks for overdue monitors every 30 seconds and transitions to Late at next_expected_at + grace_period.

  • Grace period is too short. If your job's start time drifts (e.g. cron load on a busy host), increase the grace period to at least 2× the typical variance. For hourly jobs, 5–10 minutes is a good starting point.
  • Ping is sent at the end of a long job. Add a /start ping at the beginning so FlareWarden sees the job is running, then send /complete at the end.
  • Server clock skew. FlareWarden uses UTC. Verify your server clock is synchronised (ntpdate -q pool.ntp.org).
  • Schedule is misconfigured. If the stored interval doesn't match how often the job actually runs, the next expected time is calculated incorrectly. Update the schedule on the monitor's settings page.

A successful /complete ping always returns the monitor to OK and advances the next expected time, regardless of current state.

My monitor is stuck in "Running" state

The monitor received a /start ping but no /complete or /fail has arrived. Possible causes:

  • The job crashed before reaching the success or failure ping.
  • The success ping failed silently because --retry wasn't used and a transient network error dropped it.
  • Your script's error handling doesn't cover all exit paths — the fail ping is only called for some errors.

If a Max Run Duration is configured, the monitor transitions to Failed automatically when it's exceeded. Without that setting, the monitor stays Running until a signal arrives. Send a /complete ping manually to reset it, then fix the script to send signals on all exit paths.

The ping curl call is failing my job's exit code check

Cron pings should never prevent the actual job from running or affect its reported exit code. There are two patterns:

Discard curl's exit status entirely:

curl -fsS --retry 3 "$PING_URL" || true

Capture job exit code before sending the ping:

/usr/local/bin/my-job.sh
JOB_EXIT=$?
if [ "$JOB_EXIT" -eq 0 ]; then
    curl -fsS --retry 3 "$PING_URL" || true
else
    curl -fsS --retry 3 "$PING_URL/fail?exit_code=$JOB_EXIT" || true
fi
exit $JOB_EXIT

The flags -fsS mean: fail silently on HTTP errors (-f), suppress the progress meter (-s), but still show errors (-S).

Frequently Asked Questions

What happens if my ping URL is exposed?
The UUID in the ping URL acts as a secret token. If you suspect it's been compromised, regenerate the UUID from the cron monitor's settings page. The old URL stops working immediately and a new UUID is issued. Treat the ping URL like a password — don't commit it to public repositories.
Can I monitor jobs that don't run on a fixed schedule?
Yes. Set a generous interval (e.g. "every 24 hours") and a long grace period. FlareWarden alerts you only if no ping arrives within interval + grace period. For highly irregular jobs, choose an interval representing the longest acceptable gap between runs and tune the grace period to tolerate natural variance without triggering false alerts.
Is there a rate limit on pings?
Yes: 60 requests per minute per source IP. Normal cron jobs (even at 1-minute intervals with both start and complete pings) are well within this limit. If you hit a 429, check the Retry-After header for when to retry and review the troubleshooting section above for common causes.
Do cron monitors work with the management API?
Yes. You can create, update, list, pause, resume, and delete cron monitors through the authenticated management API (session-based auth), just like content and dependency monitors. The ping_url field in the API response contains the full ready-to-use URL so you can automate provisioning without ever opening the dashboard.
How is the ping routed to the correct region?
FlareWarden runs across multiple Fly.io regions. When a ping arrives, the receiving region looks up the monitor's home region via an in-memory cache backed by a System DB registry. If the ping is at the wrong region, FlareWarden sets a fly-replay response header and Fly's infrastructure re-routes the request to the correct region transparently. From your job's perspective, you just send one curl and get back a 200.
What is the "Pending" state and when does it end?
Every newly created cron monitor starts in Pending state. FlareWarden doesn't know when your job will first run, so it waits for the first successful /complete ping before beginning to track the schedule. No alerts fire during Pending — it simply means the monitor hasn't been established yet. Once the first complete ping arrives, the monitor moves to OK and starts the expect-then-verify loop.
What does "hung job detection" mean and how do I enable it?

If you configure a Max Run Duration on the monitor and a job sends a /start ping but stays in Running state longer than that limit, FlareWarden marks it Failed and opens a "hung job" incident — even though no /fail ping was received. The check runs every 30 seconds.

This catches processes that get stuck waiting on a resource (network timeout, database lock, external API, etc.) without crashing. To enable it, open the monitor's settings, set a Max Run Duration slightly above your job's worst-case expected runtime, and ensure your job sends a /start ping at the beginning.

Can I send structured metadata with pings?

Yes. Include a metadata key in your JSON body with up to 5 string key-value pairs. Metadata is stored with the ping result and visible in the run history on the monitor's detail page. Useful for: deployment version, environment name, record count processed, source S3 bucket, job run ID.

curl -X POST -H "Content-Type: application/json" \
  -d '{"metadata":{"version":"1.4.2","env":"production","records":"8431","job_id":"run-abc123"}}' \
  https://app.flarewarden.com/ping/YOUR-UUID

Note: metadata is only available via JSON body; it cannot be sent as a query parameter or form field.

Can I send error details without writing JSON?

Yes. Use query parameters for the simplest possible integration — no JSON serialisation, no Content-Type header needed:

# URL-encode spaces as + and special characters as %XX
curl -fsS "https://app.flarewarden.com/ping/YOUR-UUID/fail?message=backup+failed%3A+disk+full&exit_code=28"

Or use URL-encoded form data with curl's --data-urlencode for automatic encoding:

curl -fsS -X POST \
  --data-urlencode "message=backup failed: no space left on device" \
  --data-urlencode "exit_code=28" \
  https://app.flarewarden.com/ping/YOUR-UUID/fail

The metadata field requires JSON and is not available via query parameters or form encoding.

Never miss a failed cron job again

Set up your first cron monitor in under a minute. No agents to install, no complex configuration — just a single curl call.

Start Monitoring Free