A multi-tenant Next.js application on Vercel, reframed as a support engineer's incident lab. One deployment serves many tenants via host-based routing; an admin console toggles deterministic, reproducible, observable, documented incidents — so you can demonstrate the full support loop: reproduce → observe → root-cause → fix → regression test → runbook.
Built as a portfolio project for a Vercel Senior Customer Support Engineer role.
See docs/ARCHITECTURE.md for the design. (The underlying
strategy notes are kept locally and are not part of this repo.)
It exercises the exact terrain of Vercel support tickets — host routing, custom domains and wildcard DNS, serverless timeouts, cache behavior, and trace-based debugging — and pairs each failure with a customer-facing runbook. It demonstrates building internal tooling/scripts and writing durable docs, the two things the role calls out most.
| Tenant directory | Serverless timeout (504) |
|---|---|
![]() |
![]() |
| Incident console | Custom-domain diagnosis |
|---|---|
![]() |
![]() |
npm install
npm run dev
# open http://localhost:3000 (tenant list)
# http://slow-api.localhost:3000 (a wired incident: 504)
# http://localhost:3000/admin (toggle incidents)Chrome/Firefox resolve *.localhost automatically. Otherwise use a Host header:
curl -i -H "Host: slow-api.localhost:3000" http://127.0.0.1:3000/Tenant subdomain routing needs a wildcard custom domain. On the default Vercel URL,
reach tenants by path instead: …vercel.app/s/slow-api, /s/stale-cache,
/s/big-upload, /s/missing-trace. The landing page, /admin, and the APIs work
normally. Set ROOT_DOMAIN to a wildcard domain you own to enable real subdomain
routing — see docs/DEPLOY.md.
| Piece | File |
|---|---|
| Host → tenant routing | middleware.ts, lib/subdomain.ts |
| Tenant + incident store (seeded, in-memory) | lib/tenants.ts |
| Incident catalog (source of truth) | lib/incidents.ts |
| Fault injection | app/api/upstream/route.ts |
| Tenant page (wires serverless-timeout) | app/s/[subdomain]/page.tsx |
| Incident console | app/admin/page.tsx |
| Observability | instrumentation.ts + Speed Insights |
Toggle any incident in /admin. Five are fully wired — serverless-timeout (504),
cache-regression (stale cache after publish), payload-too-large (413), broken-trace
(orphaned downstream spans), and invalid-domain (wildcard-SSL/DNS diagnosis) — each with
a regression test and a playbook. wrong-tenant is guarded by an integration test. See
docs/incidents/ and the per-incident table in the admin console.
@vercel/otel registers tracing (infrastructure / fetch / framework spans);
@vercel/speed-insights reports Core Web Vitals. For local traces, bring up the bundled
OpenTelemetry stack and point the app at it:
docker compose -f infra/otel/docker-compose.yml up -d
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 npm run dev
# Jaeger http://localhost:16686 · Zipkin http://localhost:9411 · Prometheus http://localhost:9090See docs/observability.md for the full walkthrough. npm run otel:smoke asserts a request actually produces a trace in Jaeger; the trace-debug skill
documents the symptom → first-signal mapping.
bash scripts/dns-check.sh <domain> runs the dig/openssl checks Vercel's domain
docs recommend; scripts/add-domain.ts onboards a domain via @vercel/sdk. The
dns-triage skill classifies failures (invalid config, pending TXT, wildcard SSL).
.github/workflows/ci.yml runs lint, typecheck, test,
build, and a k6 smoke test on PRs. Gate production promotion on these via Vercel
Deployment Checks; every incident links to a preview URL.
perf/smoke.js encodes the support SLOs (healthy-tenant p95 latency,
zero 5xx under light load). Pair k6 numbers with Speed Insights and x-vercel-cache.
This repo ships custom skills under .claude/skills/ that encode the
support loop: incident-repro, trace-debug, dns-triage, runbook-writer.
| Task | Command |
|---|---|
| Dev | npm run dev |
| Build | npm run build |
| Type-check | npm run typecheck |
| Lint | npm run lint |
| Unit tests | npm test |
| Browser e2e | npm run test:e2e (needs npx playwright install chromium once) |
- Tenant store is in-memory (documented swap point for Redis/Edge Config).
- External APM (Sentry/Datadog) and log drains are documented, not wired.
- Runs on the Vercel Hobby plan. Plan-gated features stay documented-not-wired: Log Drains (Pro+), multi-tenant preview URLs (Enterprise), adjustable function memory (Pro+). Trace inspection is via the local OTel stack, not the Vercel dashboard. See the "Running on the Hobby plan" section in docs/DEPLOY.md.
.claude/settings.json ships protective deny rules only. To let Claude run the dev
commands without prompting, add an allow list (e.g. Bash(npm run:*), Bash(dig:*))
yourself — or run /fewer-permission-prompts.



