Skip to content

[codex] Add RAG API, roadmap, and evaluation harness#10

Merged
2002yy merged 4 commits into
mainfrom
codex/rag-pipeline-hardening
Jun 5, 2026
Merged

[codex] Add RAG API, roadmap, and evaluation harness#10
2002yy merged 4 commits into
mainfrom
codex/rag-pipeline-hardening

Conversation

@2002yy
Copy link
Copy Markdown
Owner

@2002yy 2002yy commented Jun 4, 2026

What changed

  • Hardened the news pipeline with feed registry, URL normalization/redirect tracing, audit artifacts, and CI/package guard updates.
  • Added local RAG support for Markdown/TXT/DOCX/PDF, lexical/vector-prototype/hybrid retrieval, Streamlit RAG controls, and FastAPI health/RAG endpoints.
  • Recorded the next RAG roadmap in docs/RAG.md and started P4-A with an LLM-free retrieval evaluation harness over gold fixtures.

Why

The project needed documentation-to-code consistency plus a concrete path from a working RAG MVP toward measurable retrieval quality before adding larger frontend or vector-store changes.

Validation

  • python -m pytest -q: 262 passed
  • python -m ruff check .: passed
  • detect-secrets scan ...: results empty
  • git diff --check: passed with CRLF warnings only
  • python tools/package_project_helper.py . NUL 0: OK
  • python -m mypy --explicit-package-bases src: existing 18 soft-check errors remain

@2002yy 2002yy marked this pull request as ready for review June 5, 2026 09:25
@2002yy 2002yy merged commit b400fd9 into main Jun 5, 2026
2 checks passed
@2002yy 2002yy deleted the codex/rag-pipeline-hardening branch June 5, 2026 09:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4bbccc5354

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/api.py
Comment on lines +69 to +72
target = _index_path(request.index_path)
index = index_documents(
request.paths,
index_path=target,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict RAG filesystem paths

When the FastAPI service is reachable by an untrusted client, this endpoint accepts both server-side document paths and the output index_path directly from the request; a caller can index readable .md/.txt/.docx/.pdf files from anywhere the process can access and can also point index_path at any writable file, which safe_write_text will replace with the JSON index. Keep these paths under a controlled directory or reject absolute/parent paths before passing them into index_documents.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant