Skip to content

feat(scorers): artifact scorer suite — bibliography, governance receipt, ARCANA essay#8

Merged
hummbl-dev merged 4 commits intomainfrom
feat/claude/artifact-scorer-suite
Apr 16, 2026
Merged

feat(scorers): artifact scorer suite — bibliography, governance receipt, ARCANA essay#8
hummbl-dev merged 4 commits intomainfrom
feat/claude/artifact-scorer-suite

Conversation

@hummbl-dev
Copy link
Copy Markdown
Owner

Summary

Adds the full Arbiter artifact scorer suite — a penalty-based, multi-dimensional quality scoring system for knowledge artifacts before CLP ingest.

  • artifact_scorer.py — base framework: ArtifactScorer(ABC), ArtifactScorerRegistry, DEFAULT_REGISTRY, _score_from_findings() penalty engine (CRITICAL=-25, HIGH=-15, MEDIUM=-7, LOW=-3), A–F grade scale
  • BibliographyScorer — 5 dimensions: DOI coverage, tier distribution, tag density, completeness, source density
  • GovernanceReceiptScorer — 5 dimensions: completeness, chain_of_custody, timestamp_validity, evidence_ratio, schema_compliance; EU AI Act Article 12 + NIST AI RMF GOVERN 1.2 aligned
  • ArcanaEssayScorer — adversarial gate for ARCANA synthesis essays; 5 dimensions: empirical_grounding (0.30), citation_density (0.25), structural_integrity (0.20), source_diversity (0.15), on_topic_ratio (0.10); rules ARC101–502; blocks grade F from CLP ingest
  • scorers/__init__.py — registers all 3 scorers into DEFAULT_REGISTRY
  • 51 tests — all green

Test plan

  • test_artifact_scorer.py — 51 tests covering all 3 scorers + base framework
  • Grade scale verified: A≥90, B≥80, C≥70, D≥60, F<60
  • ARC304 rule: zero citations → HIGH severity on empirical_grounding
  • ArcanaEssayScorer blocks grade F from ledger ingest (wired in arcana_ingest.py)

🤖 Generated with Claude Code

Claude (agent) and others added 2 commits April 5, 2026 22:00
Only registered runner (windows-desktop-1) is offline. Arbiter is pure
Python — ubuntu-latest works fine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the Arbiter knowledge artifact scoring infrastructure:

- ArtifactScorer (ABC) + ArtifactScorerRegistry + _score_from_findings() in
  artifact_scorer.py — penalty-based weighted scoring, CRITICAL/HIGH/MEDIUM/LOW
  severity grades, A–F letter grade scale
- BibliographyScorer (5 dims): DOI coverage, tier distribution, tag density,
  completeness, source density
- GovernanceReceiptScorer (5 dims): completeness, chain_of_custody,
  timestamp_validity, evidence_ratio, schema_compliance — EU AI Act Article 12
  + NIST AI RMF GOVERN 1.2 alignment
- ArcanaEssayScorer (5 dims): empirical_grounding, citation_density,
  structural_integrity, source_diversity, on_topic_ratio — adversarial gate
  blocks echo-chamber synthesis (grade F) from entering CLP ledger; rules
  ARC101–502 including ARC304 (zero-citation block)
- All three scorers registered in DEFAULT_REGISTRY via scorers/__init__.py
- 51 tests green (35 existing + 16 new for ArcanaEssayScorer)

Closes ARCANA → Arbiter gate → CLP ingest flywheel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Claude (agent) and others added 2 commits April 16, 2026 07:36
…lock self-grade

Self-grade jumps 85.1 → 90.6 (B → A), clearing the >=90 CI gate.

- Extract per-dimension helpers from BibliographyScorer.score (CC 29 → ~6)
- Auto-fix 7 ruff findings (F401 unused imports, F541, F841, asdict)
- Manually rename `l` → `link` (E741) in citation parsing
- Drop dead `datetime.date.today().year` line

Lint:       88.0 → 100.0
Complexity: 72.6 →  76.6
Overall:    85.1 →  90.6 (A)

138/138 tests pass. No behavioral change — all 51 artifact_scorer tests green.

Follow-up (not blocking): arcana_essay_scorer.score (CC 47) and
governance_receipt_scorer.score (CC 46) still flag complexity findings.
Same extract-helpers pattern applies if/when margin tightens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… diff gate

Diff score 79.7 → 97.3 (clears --fail-under 80).
Full repo  90.6 → 97.7 (still clears --fail-under 90).

- ArcanaEssayScorer.score: CC 47 → 11. Extracted: _check_structure,
  _check_citation_density, _check_empirical_grounding, _check_source_diversity,
  _check_on_topic.
- GovernanceReceiptScorer.score: CC 46 → small. Extracted: _check_completeness,
  _check_chain_of_custody, _check_primary_timestamp, _check_chain_timestamps,
  _check_evidence, _check_schema.

No behavioral change — all 138 tests pass (51 artifact_scorer tests cover both).
Same extract-per-dimension pattern used for bibliography_scorer in b4ba862.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hummbl-dev hummbl-dev merged commit f358a9c into main Apr 16, 2026
3 checks passed
@hummbl-dev hummbl-dev deleted the feat/claude/artifact-scorer-suite branch April 16, 2026 11:56
hummbl-dev pushed a commit that referenced this pull request Apr 16, 2026
The artifact scorers (BibliographyScorer, GovernanceReceiptScorer,
ArcanaEssayScorer) shipped in PR #8 but had no CLI surface — they were
only callable from Python. This wires them into the CLI:

  arbiter score-artifact --list-types          # show registered scorers
  arbiter score-artifact <file.json>           # auto-detect type from JSON
  arbiter score-artifact <file> --type X       # explicit type
  arbiter score-artifact <file> --json         # machine output
  arbiter score-artifact <file> --fail-under N # CI gate

Type resolution: explicit --type wins; otherwise reads top-level
"artifact_type" field from the JSON.

Closes #10 item 5 partially:
- registry exposed via CLI, all 3 scorers usable end-to-end
- explicit type or in-file artifact_type works
- audit-fleet auto-detection (find *.bib files in repos, etc.) is
  a separate follow-up — left open in #10

Tests: 13 new in tests/test_score_artifact_cli.py covering --list-types,
explicit + inferred type, JSON output, all error paths, --fail-under.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
hummbl-dev added a commit that referenced this pull request Apr 16, 2026
The artifact scorers (BibliographyScorer, GovernanceReceiptScorer,
ArcanaEssayScorer) shipped in PR #8 but had no CLI surface — they were
only callable from Python. This wires them into the CLI:

  arbiter score-artifact --list-types          # show registered scorers
  arbiter score-artifact <file.json>           # auto-detect type from JSON
  arbiter score-artifact <file> --type X       # explicit type
  arbiter score-artifact <file> --json         # machine output
  arbiter score-artifact <file> --fail-under N # CI gate

Type resolution: explicit --type wins; otherwise reads top-level
"artifact_type" field from the JSON.

Closes #10 item 5 partially:
- registry exposed via CLI, all 3 scorers usable end-to-end
- explicit type or in-file artifact_type works
- audit-fleet auto-detection (find *.bib files in repos, etc.) is
  a separate follow-up — left open in #10

Tests: 13 new in tests/test_score_artifact_cli.py covering --list-types,
explicit + inferred type, JSON output, all error paths, --fail-under.

Co-authored-by: Claude (agent) <claude@agents.hummbl.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant