Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughAdded two new RAG evaluation docs: a detailed Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/public/ragas-rag-eval.ipynb (2)
41-45: Add commented version-pinned installation example.The documentation (ragas.mdx line 107) emphasizes version pinning for reproducibility, but the notebook installs packages without version constraints. Consider adding a commented alternative showing pinned versions.
📌 Suggested addition for version pinning
# Use current kernel's Python so PATH does not point to another env # If download is slow, add: -i https://pypi.tuna.tsinghua.edu.cn/simple +# For reproducible benchmarks, pin versions (example): +# !{sys.executable} -m pip install "ragas==0.1.9" "datasets==2.18.0" "openai==1.12.0" import sys !{sys.executable} -m pip install "ragas" "datasets" "openai"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/public/ragas-rag-eval.ipynb` around lines 41 - 45, The notebook currently installs packages with a generic pip command using sys.executable ("!{sys.executable} -m pip install \"ragas\" \"datasets\" \"openai\"") which conflicts with the repo guidance to pin versions; add a commented alternative right after that line showing a version-pinned install (e.g., a commented pip command that pins specific versions for ragas, datasets, and openai) and include a short comment explaining this is for reproducibility and when to use it; keep the original unpinned command intact but make the pinned example clearly visible and updatable.
199-247: LGTM: Correct baseline metrics evaluation.The evaluation flow properly instantiates metrics with required dependencies (llm, embeddings), uses async
ascore()calls with correct arguments, and computes both aggregate and per-row results.Optional: Consider adding error handling for API failures.
For production use, wrapping the scoring calls in try-except blocks could provide better resilience against transient API failures.
🛡️ Optional error handling pattern
async def score_baseline_rows(ds): rows = ds.to_list() scored = [] for idx, row in enumerate(rows): try: faithfulness_result = await faithfulness_metric.ascore( user_input=row["user_input"], response=row["response"], retrieved_contexts=row["retrieved_contexts"], ) answer_relevancy_result = await answer_relevancy_metric.ascore( user_input=row["user_input"], response=row["response"], ) scored.append({ "user_input": row["user_input"], "faithfulness": faithfulness_result.value, "answer_relevancy": answer_relevancy_result.value, }) except Exception as e: print(f"Error scoring row {idx}: {e}") # Optionally append None values or skip return scored🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/public/ragas-rag-eval.ipynb` around lines 199 - 247, Wrap the asynchronous scoring loop in score_baseline_rows with error handling so transient API/LLM failures don't crash the whole evaluation: inside score_baseline_rows (where faithfulness_metric.ascore and answer_relevancy_metric.ascore are called) add a try/except around the two await calls, log or print the exception with the row index and user_input, and decide on a stable fallback (e.g., append a result with None or sentinel values for "faithfulness" and "answer_relevancy", or skip the row) before continuing; ensure exceptions from either faithfulness_metric.ascore or answer_relevancy_metric.ascore are caught so the loop continues for remaining rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/en/ragas.mdx`:
- Around line 71-88: The docs show embedding_factory(..., interface="modern")
but the notebook uses OpenAIEmbeddings(), causing inconsistent examples; pick
one pattern and make both examples match. Either update the notebook to
instantiate embeddings via embedding_factory(provider="openai",
model="your-embedding-model", client=client, interface="modern") to mirror the
MDX example, or update the MDX snippet to use OpenAIEmbeddings(...) (the same
constructor and client as the notebook); ensure references to embedding_factory
and OpenAIEmbeddings in the examples are consistent and keep llm_factory usage
unchanged.
---
Nitpick comments:
In `@docs/public/ragas-rag-eval.ipynb`:
- Around line 41-45: The notebook currently installs packages with a generic pip
command using sys.executable ("!{sys.executable} -m pip install \"ragas\"
\"datasets\" \"openai\"") which conflicts with the repo guidance to pin
versions; add a commented alternative right after that line showing a
version-pinned install (e.g., a commented pip command that pins specific
versions for ragas, datasets, and openai) and include a short comment explaining
this is for reproducibility and when to use it; keep the original unpinned
command intact but make the pinned example clearly visible and updatable.
- Around line 199-247: Wrap the asynchronous scoring loop in score_baseline_rows
with error handling so transient API/LLM failures don't crash the whole
evaluation: inside score_baseline_rows (where faithfulness_metric.ascore and
answer_relevancy_metric.ascore are called) add a try/except around the two await
calls, log or print the exception with the row index and user_input, and decide
on a stable fallback (e.g., append a result with None or sentinel values for
"faithfulness" and "answer_relevancy", or skip the row) before continuing;
ensure exceptions from either faithfulness_metric.ascore or
answer_relevancy_metric.ascore are caught so the loop continues for remaining
rows.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 16c039f1-c2e4-4515-b3b6-b3a98288383c
📒 Files selected for processing (2)
docs/en/ragas.mdxdocs/public/ragas-rag-eval.ipynb
Deploying alauda-ai with
|
| Latest commit: |
add5485
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://f91cec6e.alauda-ai.pages.dev |
| Branch Preview URL: | https://feat-ragas-docs.alauda-ai.pages.dev |
Summary by CodeRabbit