-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
effort: smallQuick wins, <1 day effortQuick wins, <1 day effortinfrastructureDevOps, CI/CD, monitoringDevOps, CI/CD, monitoringpriority: lowNice to haveNice to have
Description
Problem
No monitoring means:
- Downtime invisible until users report
- No SLA tracking
- Can't measure availability
- Slow incident response
Solution
Use UptimeRobot (free tier) for basic monitoring:
UptimeRobot Setup
-
Create account: https://uptimerobot.com
-
Add HTTP(s) monitor:
- URL: https://controlforge.dev
- Type: HTTPS
- Interval: 5 minutes (free tier)
-
Add ping endpoints:
- Homepage: https://controlforge.dev
- Docs: https://controlforge.dev/docs
- Health: https://controlforge.dev/api/health (if created)
Alert Channels
- Slack webhook
- Discord webhook
- SMS (paid tier)
Health Check Endpoint
// src/routes/api/health/+server.ts
export async function GET() {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
version: import.meta.env.VITE_BUILD_COMMIT
};
return new Response(JSON.stringify(health), {
headers: { 'Content-Type': 'application/json' }
});
}Alternative: BetterUptime
More features, better UX:
- Status page (public)
- Incident management
- On-call scheduling
- Free tier: 10 monitors
Monitoring Checklist
- Homepage (https://controlforge.dev)
- Documentation (https://controlforge.dev/docs)
- API health endpoint
- SSL certificate expiry
- DNS resolution
- Response time (<500ms)
Status Page
Create public status page showing:
- Uptime % (30/60/90 day)
- Current status
- Incident history
- Scheduled maintenance
<!-- src/routes/status/+page.svelte -->
<script>
let status = { operational: true, uptime: 99.9 };
</script>
<h1>System Status</h1>
<div class="status {status.operational ? 'operational' : 'outage'}">
{status.operational ? '✓ All Systems Operational' : '⚠️ Experiencing Issues'}
</div>
<div class="uptime">
<h2>Uptime</h2>
<p>30 days: {status.uptime}%</p>
</div>Alert Configuration
Send alerts when:
- Site down for >2 minutes
- Response time >3 seconds
- SSL certificate expires <30 days
- 4xx/5xx error rate >5%
Success Criteria
- Monitoring configured for all endpoints
- Alerts sent to email/Slack
- Public status page available
- Historical uptime data tracked
- Response time monitored
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
effort: smallQuick wins, <1 day effortQuick wins, <1 day effortinfrastructureDevOps, CI/CD, monitoringDevOps, CI/CD, monitoringpriority: lowNice to haveNice to have