Skip to content

add docs for RAGAS#160

Open
davidwtf wants to merge 2 commits intomasterfrom
feat/ragas-docs
Open

add docs for RAGAS#160
davidwtf wants to merge 2 commits intomasterfrom
feat/ragas-docs

Conversation

@davidwtf
Copy link
Contributor

@davidwtf davidwtf commented Mar 23, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive guide for evaluating RAG systems with the Ragas SDK: dataset field requirements, categories of metrics with what they measure, recommended modern workflow for scoring, prerequisites, and guidance for interpreting results.
    • Added an interactive Jupyter notebook demonstrating a complete end-to-end RAG evaluation flow with runnable examples, scoring patterns, aggregated metrics, per-row outputs, and troubleshooting tips.

@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: feaccca0-703d-4b85-9ace-36ad87340d8d

📥 Commits

Reviewing files that changed from the base of the PR and between 3cfcebd and add5485.

📒 Files selected for processing (1)
  • docs/en/ragas.mdx
✅ Files skipped from review due to trivial changes (1)
  • docs/en/ragas.mdx

Walkthrough

Added two new RAG evaluation docs: a detailed docs/en/ragas.mdx guide describing dataset fields, metric categories, and the modern Ragas SDK scoring flow; and docs/public/ragas-rag-eval.ipynb, a runnable notebook with env config, client setup, async scoring, and aggregation examples.

Changes

Cohort / File(s) Summary
RAG Evaluation Documentation
docs/en/ragas.mdx, docs/public/ragas-rag-eval.ipynb
New documentation page and companion notebook describing dataset schema (user_input, retrieved_contexts, response, reference), Ragas metric categories and required arguments, modern SDK flow using llm/embeddings clients, usage of ascore() (and score() for sync), prerequisites, troubleshooting, and result interpretation. Notebook includes environment-variable configuration, async OpenAI-compatible client construction, dataset creation, per-row async scoring, and aggregate metric computation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 I hopped through docs and a notebook bright,

Found contexts, references, metrics in sight.
Async calls hum, scores gather and beam,
A rabbit’s small cheer for a tidy RAG dream. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'add docs for RAGAS' is generic and vague. It describes the action of adding documentation but lacks specificity about what aspect of RAGAS is documented, failing to convey meaningful information about the changeset. Consider a more descriptive title such as 'Add documentation for RAG evaluation using Ragas SDK' to clearly communicate the primary purpose of the documentation.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ragas-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/public/ragas-rag-eval.ipynb (2)

41-45: Add commented version-pinned installation example.

The documentation (ragas.mdx line 107) emphasizes version pinning for reproducibility, but the notebook installs packages without version constraints. Consider adding a commented alternative showing pinned versions.

📌 Suggested addition for version pinning
 # Use current kernel's Python so PATH does not point to another env
 # If download is slow, add: -i https://pypi.tuna.tsinghua.edu.cn/simple
+# For reproducible benchmarks, pin versions (example):
+# !{sys.executable} -m pip install "ragas==0.1.9" "datasets==2.18.0" "openai==1.12.0"
 import sys
 !{sys.executable} -m pip install "ragas" "datasets" "openai"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/public/ragas-rag-eval.ipynb` around lines 41 - 45, The notebook
currently installs packages with a generic pip command using sys.executable
("!{sys.executable} -m pip install \"ragas\" \"datasets\" \"openai\"") which
conflicts with the repo guidance to pin versions; add a commented alternative
right after that line showing a version-pinned install (e.g., a commented pip
command that pins specific versions for ragas, datasets, and openai) and include
a short comment explaining this is for reproducibility and when to use it; keep
the original unpinned command intact but make the pinned example clearly visible
and updatable.

199-247: LGTM: Correct baseline metrics evaluation.

The evaluation flow properly instantiates metrics with required dependencies (llm, embeddings), uses async ascore() calls with correct arguments, and computes both aggregate and per-row results.

Optional: Consider adding error handling for API failures.

For production use, wrapping the scoring calls in try-except blocks could provide better resilience against transient API failures.

🛡️ Optional error handling pattern
async def score_baseline_rows(ds):
    rows = ds.to_list()
    scored = []
    for idx, row in enumerate(rows):
        try:
            faithfulness_result = await faithfulness_metric.ascore(
                user_input=row["user_input"],
                response=row["response"],
                retrieved_contexts=row["retrieved_contexts"],
            )
            answer_relevancy_result = await answer_relevancy_metric.ascore(
                user_input=row["user_input"],
                response=row["response"],
            )
            scored.append({
                "user_input": row["user_input"],
                "faithfulness": faithfulness_result.value,
                "answer_relevancy": answer_relevancy_result.value,
            })
        except Exception as e:
            print(f"Error scoring row {idx}: {e}")
            # Optionally append None values or skip
    return scored
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/public/ragas-rag-eval.ipynb` around lines 199 - 247, Wrap the
asynchronous scoring loop in score_baseline_rows with error handling so
transient API/LLM failures don't crash the whole evaluation: inside
score_baseline_rows (where faithfulness_metric.ascore and
answer_relevancy_metric.ascore are called) add a try/except around the two await
calls, log or print the exception with the row index and user_input, and decide
on a stable fallback (e.g., append a result with None or sentinel values for
"faithfulness" and "answer_relevancy", or skip the row) before continuing;
ensure exceptions from either faithfulness_metric.ascore or
answer_relevancy_metric.ascore are caught so the loop continues for remaining
rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/ragas.mdx`:
- Around line 71-88: The docs show embedding_factory(..., interface="modern")
but the notebook uses OpenAIEmbeddings(), causing inconsistent examples; pick
one pattern and make both examples match. Either update the notebook to
instantiate embeddings via embedding_factory(provider="openai",
model="your-embedding-model", client=client, interface="modern") to mirror the
MDX example, or update the MDX snippet to use OpenAIEmbeddings(...) (the same
constructor and client as the notebook); ensure references to embedding_factory
and OpenAIEmbeddings in the examples are consistent and keep llm_factory usage
unchanged.

---

Nitpick comments:
In `@docs/public/ragas-rag-eval.ipynb`:
- Around line 41-45: The notebook currently installs packages with a generic pip
command using sys.executable ("!{sys.executable} -m pip install \"ragas\"
\"datasets\" \"openai\"") which conflicts with the repo guidance to pin
versions; add a commented alternative right after that line showing a
version-pinned install (e.g., a commented pip command that pins specific
versions for ragas, datasets, and openai) and include a short comment explaining
this is for reproducibility and when to use it; keep the original unpinned
command intact but make the pinned example clearly visible and updatable.
- Around line 199-247: Wrap the asynchronous scoring loop in score_baseline_rows
with error handling so transient API/LLM failures don't crash the whole
evaluation: inside score_baseline_rows (where faithfulness_metric.ascore and
answer_relevancy_metric.ascore are called) add a try/except around the two await
calls, log or print the exception with the row index and user_input, and decide
on a stable fallback (e.g., append a result with None or sentinel values for
"faithfulness" and "answer_relevancy", or skip the row) before continuing;
ensure exceptions from either faithfulness_metric.ascore or
answer_relevancy_metric.ascore are caught so the loop continues for remaining
rows.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 16c039f1-c2e4-4515-b3b6-b3a98288383c

📥 Commits

Reviewing files that changed from the base of the PR and between 904d33b and 3cfcebd.

📒 Files selected for processing (2)
  • docs/en/ragas.mdx
  • docs/public/ragas-rag-eval.ipynb

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 23, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: add5485
Status: ✅  Deploy successful!
Preview URL: https://f91cec6e.alauda-ai.pages.dev
Branch Preview URL: https://feat-ragas-docs.alauda-ai.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant