why
Datadog and Grafana Cloud are overkill for a side project. Setting up Prometheus + Grafana + Alertmanager is a weekend. WatchTower is the in-between: drop a middleware into your Express app, stand up a collector somewhere, and start getting Slack pings when things break.
the agent
watchtower-agent is Express middleware. One line of bootstrap and you’re done:
import { watchtower } from "watchtower-agent";
app.use(watchtower({
serviceName: "my-api",
collector: "http://localhost:9090",
}));
It captures, via prom-client:
- HTTP request duration as a histogram (p50, p95, p99)
- HTTP requests total, labelled by method, route, and status code
- Node defaults: CPU, memory, event loop lag, GC
It then pushes Prometheus text to the collector every 15 seconds. (Yes, push, not scrape. The collector doesn’t need to reach into your network.) The agent can also expose a local /metrics endpoint if you want to keep Prometheus in the loop later.
the collector
watchtower-collector listens on :9090:
POST /ingestaccepts the agent’s Prometheus text. The agent identifies itself with anX-Service-Nameheader.GET /metrics/:servicereturns the latest snapshot.GET /serviceslists everyone who has reported in.
Samples land in a ring buffer so memory is bounded. Every 30 seconds an alert evaluator walks the rules in alerts.json:
{
"name": "high-p95-latency",
"service": "*",
"metric": "http_request_duration_ms",
"condition": "p95 > 400",
"window": 4,
"channels": ["slack", "discord", "email"]
}
When a rule fires, dispatch fans out to Slack webhook, Discord webhook, and email via Resend. Each rule has a 5-minute dedup window so a flapping condition doesn’t carpet-bomb your channels.
what this isn’t
A replacement for Prometheus + Grafana in production. WatchTower is a focused agent / collector / alerting pipeline for teams that want observability without the setup overhead. For real scale, use OpenTelemetry.