Skip to content

Numeric citation references [N, p. X] with reference list#36

Merged
markgewhite merged 4 commits into
mainfrom
feature/26-numeric-citation-references
Apr 12, 2026
Merged

Numeric citation references [N, p. X] with reference list#36
markgewhite merged 4 commits into
mainfrom
feature/26-numeric-citation-references

Conversation

@markgewhite
Copy link
Copy Markdown
Owner

Summary

  • Replaced verbose inline citations (full filenames) with compact numeric references [N, p. X] and a numbered reference list appended to each answer
  • Each unique document gets one reference number — same doc cited on different pages shares the same number
  • Reference list uses relative paths from the configured folder
  • Source elements in the UI now show [N] path, p. X format
  • Updated system prompt to instruct the LLM to use numeric citation format

Closes #26

Test plan

  • 16 tests in test_answerer.py — all pass (12 new, 4 updated)
  • Full suite: 104 tests pass, zero regressions
  • Manual: run app, ask a question, verify [N, p. X] inline citations and reference list appear
  • Manual: verify same document cited on different pages shares same number

🤖 Generated with Claude Code

markgewhite and others added 4 commits April 12, 2026 14:38
Replaced verbose inline citations (full filenames) with compact numeric
references and a numbered reference list appended to each prompt. Same
document cited on different pages shares the same reference number.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CrossEncoder was failing to load because `bge-reranker-v2-m3` is not
a valid HuggingFace repo ID — the correct identifier is `BAAI/bge-reranker-v2-m3`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reference list is now appended programmatically after the streamed answer
  (not relying on the LLM to reproduce it)
- Reference numbers persist across answers in the same session, so follow-up
  queries with new documents continue the numbering sequence
- Made build_ref_map and build_reference_list public for use by app.py
- System prompt now instructs the LLM not to reproduce the reference list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM sometimes generates its own reference list despite being told not
to. This adds a regex-based strip that removes any trailing reference
section from the streamed output before the programmatic one is appended.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@markgewhite markgewhite merged commit 4f5cdfe into main Apr 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Numeric citation references [1], [2] with reference list at end

1 participant