Redesign issue definitions (tracking)

﻿## Summary

Track the redesign of the Thread Observability "issues" concept. While this is open, issue detection is paused and the API returns a placeholder envelope (see #4).

This issue is the design home. No code lands here directly — each accepted rule should ship as its own PR referencing this issue.

## Problem with the previous rules

- Most rules restated state already visible on the dashboard (partition count, phantom node count, stale-link count). Surfacing these as "issues" adds noise without insight.
- The rules were opinionated about diagnosis. An AI consumer reading `list_active_issues` would anchor on the labeled framing rather than re-deriving from data. Wrong framing -> wrong investigation path.
- Severity did not reflect actionability. Long-standing benign artifacts ranked alongside fresh anomalies.
- Several rules were not falsifiable — there was no defined "this is cleared" condition.

## Design principles for the new rules

A rule may ship into the issues system only if it satisfies **all** of the following:

1. **Not a restatement of state.** State already exposed by another endpoint (`/v1/topology`, `/v1/partitions`, `/v1/links/stale`, `/v1/nodes`, etc.) is not an issue. An issue is a *claim about* state.
2. **Narrows diagnosis.** A Thread-literate engineer would agree the issue points at a smaller set of causes than the raw data does. If the rule does not reduce the search space, it does not ship.
3. **Falsifiable.** The rule must include an explicit "cleared when ..." condition. If you cannot write the SQL/predicate that closes the issue, the rule is too vague.
4. **Severity = actionability x freshness.** Not novelty. Not noise level. A 6-month-old stale link is lower severity than the same link appearing in the last ingest.
5. **Evidence travels with the issue.** Each issue row carries the EUI64s involved, the observation that triggered it, `first_seen`, `last_seen`, and the supporting metric value(s). Consumers must be able to re-evaluate without trusting the label.
6. **Cap on rule count.** Target 6-10 total. If a candidate rule cannot justify its slot against the others, it does not ship.

## Candidate rules (not yet accepted)

Each candidate needs: trigger predicate, clearing predicate, evidence shape, severity calculation, and a short justification against the principles above. None ship until reviewed against the bar.

- [ ] **Real partition split** — multiple partitions present *and* at least one device seen in more than one partition within a recent window *and* no router-router link bridging them. (NOT: "two partitions exist.")
- [ ] **Dead-link reference** — a router NeighborTable references an EUI64 the registry has never seen, *and* the reference has persisted across N ingests.
- [ ] **Routing loop / unreachable next hop** — `walk_route_to_otbr` terminates in a loop, partition mismatch, or unknown next hop.
- [ ] **Router child-cap saturation** — a parent router is at or above the practical 10-child cap and another node is trying to attach.
- [ ] **OTBR isolation** — OTBR has no neighbors above an LQI threshold and no routes inbound, over a sustained window (rule out boot artifacts).
- [ ] _(add more — each must justify its slot)_

For each accepted candidate, open a child PR titled `issues: add rule <name>` linking back here.

## Non-goals

- Reviving any of the old rules verbatim. They have to pass the bar above to come back.
- Designing a UI for issues. The placeholder card (#4) is sufficient until rules exist.
- AI-side reasoning playbooks. Those are downstream of having trustworthy issues.

## Definition of done for this tracking issue

- At least 3 rules accepted and shipped behind the principles above.
- Placeholder note in `/v1/issues` and the dashboard card removed (or replaced with real content).
- `list_active_issues` MCP tool documented with the new rule taxonomy.

Related: #4 (placeholder implementation).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign issue definitions (tracking) #5

Summary

Problem with the previous rules

Design principles for the new rules

Candidate rules (not yet accepted)

Non-goals

Definition of done for this tracking issue

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Redesign issue definitions (tracking) #5

Description

Summary

Problem with the previous rules

Design principles for the new rules

Candidate rules (not yet accepted)

Non-goals

Definition of done for this tracking issue

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions