Skip to content

Monitoring

Patterns for monitoring taskora in production.

Looking for the admin UI?

For an in-process, batteries-included dashboard with workflow DAGs, schedule management, DLQ, and live SSE updates, see Board. This page covers long-term metrics pipelines (Grafana / Datadog / Prometheus) — the board is not a replacement for those, it complements them.

Health Check Endpoint

ts
app.get("/health/queue", async (req, res) => {
  const stats = await taskora.inspect().stats()

  const healthy =
    stats.failed < 100 &&
    stats.waiting < 10_000

  res.status(healthy ? 200 : 503).json(stats)
})

Metrics Collection

Use events to feed metrics into Prometheus, Datadog, or any metrics system:

ts
taskora.on("task:completed", ({ task, duration }) => {
  metrics.histogram("taskora.job.duration_ms", duration, { task })
  metrics.increment("taskora.job.completed", { task })
})

taskora.on("task:failed", ({ task, willRetry }) => {
  if (willRetry) {
    metrics.increment("taskora.job.retried", { task })
  } else {
    metrics.increment("taskora.job.failed", { task })
  }
})

taskora.on("task:stalled", ({ task, action }) => {
  metrics.increment("taskora.job.stalled", { task, action })
})

Periodic Stats Collection

ts
setInterval(async () => {
  const stats = await taskora.inspect().stats()
  metrics.gauge("taskora.queue.waiting", stats.waiting)
  metrics.gauge("taskora.queue.active", stats.active)
  metrics.gauge("taskora.queue.delayed", stats.delayed)
  metrics.gauge("taskora.queue.failed", stats.failed)
}, 10_000)

Alert Thresholds

Suggested alerts for production:

MetricWarningCritical
failed count> 50> 200
waiting depth> 5,000> 50,000
Job duration P99> 30s> 120s
Stalled jobs/hour> 5> 20

Dashboard Queries

Example queries for a Grafana/Prometheus dashboard:

  • Throughput: rate(taskora_job_completed_total[5m]) by task
  • Error rate: rate(taskora_job_failed_total[5m]) / rate(taskora_job_completed_total[5m])
  • Queue depth: taskora_queue_waiting by task
  • Latency P95: histogram_quantile(0.95, taskora_job_duration_ms_bucket)

Inspecting Individual Jobs

ts
const job = await taskora.inspect().find(jobId)
if (job) {
  console.log(`State: ${job.state}`)
  console.log(`Attempt: ${job.attempt}`)
  console.log(`Error: ${job.error}`)
  console.log(`Logs:`, job.logs)
  console.log(`Timeline:`, job.timeline)
}

Released under the MIT License.