Skip to content

Integrate Langfuse observability into VQA evaluation and update tracing replacing Opik as default#34

Merged
aravind-3105 merged 9 commits intomainfrom
langchain-modification
Mar 19, 2026
Merged

Integrate Langfuse observability into VQA evaluation and update tracing replacing Opik as default#34
aravind-3105 merged 9 commits intomainfrom
langchain-modification

Conversation

@aravind-3105
Copy link
Member

Summary

This pull request migrates the observability and tracing integration in the Agentic VQA Evaluation framework from Opik to Langfuse. All documentation, code references, environment variables, and integration logic have been updated to use Langfuse as the LLM observability platform, replacing Opik. The migration is comprehensive, covering user instructions, dependency management, file structure, and all agent classes.

Clickup Ticket(s): Link(s) if applicable.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

Migration from Opik to Langfuse:

  • All references to Opik in the documentation (README.md) have been replaced with Langfuse, including setup instructions, feature descriptions, and environment variables.
  • The observability integration folder has been renamed from opik_integration to langfuse_integration, and all code references in agent classes (planner_agent.py, vision_agent.py, verifier_agent.py) have been updated accordingly.
  • Function and argument names in agent classes have been changed from opik_trace to lf_trace to reflect the new integration, ensuring all tracing and span operations now use Langfuse.
  • Environment variable setup and usage have been updated: LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY now control observability, replacing the previous Opik variables. Documentation and code reflect that Langfuse is optional and the framework runs unchanged if these variables are not set.
  • Example MEP artifacts and references in the documentation now use lf_trace_id instead of opik_trace_id.

Dependency and setup updates:

  • The dependency group for installation has been renamed from agentic-xai-eval to ref6-agentic-xai-eval in the setup instructions.
  • The dependency table now lists langfuse instead of opik.

These changes ensure the framework is fully integrated with Langfuse for LLM observability, providing a more modern and flexible tracing and experiment comparison experience.

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

…ng Opik references and updating tracing mechanisms
@aravind-3105 aravind-3105 self-assigned this Mar 18, 2026
@aravind-3105 aravind-3105 added the enhancement New feature or request label Mar 18, 2026
@aravind-3105 aravind-3105 requested a review from Copilot March 19, 2026 13:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Agentic ChartQAPro VQA evaluation framework’s observability layer from Opik to Langfuse, updating tracing, score logging, retroactive ingestion, and documentation/notebooks to reflect the new integration.

Changes:

  • Replaced Opik tracing + clients with Langfuse equivalents across pipeline runners, agents, tools, and evaluation passes.
  • Added Langfuse integration modules (client, tracing wrappers, ingestion) and removed the Opik integration package.
  • Updated docs and notebooks to reference Langfuse setup, environment variables, and lf_trace_id in MEPs.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Removes Opik and its transitive deps; locks Langfuse and related deps.
pyproject.toml Removes Opik from the agentic-xai-eval dependency group.
implementations/mechanistic_interpretability/README.md Trims trailing whitespace.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/tools/vision_qa_tool.py Switches tracing import + tool trace handle field to Langfuse.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/tools/ocr_reader_tool.py Switches tracing import + tool trace handle field to Langfuse.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/runner/run_generate_meps.py Uses Langfuse client/dataset/prompts/tracing; records lf_trace_id; updates trace output.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/tracing.py Deleted (replaced by Langfuse tracing wrappers).
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/ingest.py Deleted (replaced by Langfuse ingestion).
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/client.py Deleted (replaced by Langfuse client).
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/__init__.py Deleted (package removed).
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/mep/schema.py Renames opik_trace_id to lf_trace_id in the MEP schema.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/tracing.py Adds Langfuse v4 tracing wrappers (sample_trace, open_llm_span, close_span, scoring).
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/prompts.py Updates prompt management from Opik to Langfuse APIs + CLI instructions.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/ingest.py Adds retroactive ingestion from MEP JSON into Langfuse traces + scoring.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/dataset.py Updates dataset registration to Langfuse dataset/items APIs.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/client.py Adds Langfuse client singleton keyed off LANGFUSE_PUBLIC_KEY/SECRET_KEY.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/__init__.py Adds Langfuse integration package marker.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/eval_topk.py Logs top-k hit metrics back to Langfuse trace scores.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/eval_outputs.py Logs accuracy/judge/latency scores back to Langfuse trace scores.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/error_taxonomy.py Logs taxonomy failure category scores back to Langfuse trace scores.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/dashboard.py Updates default MEP directory path in Streamlit UI.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/agents/vision_agent.py Renames trace parameter to lf_trace and swaps tracing imports to Langfuse.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/agents/verifier_agent.py Renames trace parameter to lf_trace and swaps tracing imports to Langfuse.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/agents/planner_agent.py Renames trace parameter to lf_trace and swaps tracing imports to Langfuse.
implementations/agentic_vqa_eval/run_pipeline.ipynb Updates “health check” and pipeline cells to use Langfuse client + env vars + lf_trace_id.
implementations/agentic_vqa_eval/analysis.ipynb Makes prerequisite commands robust to execution from any repo directory; fixes empty-failure edge case.
implementations/agentic_vqa_eval/README.md Rewrites observability docs from Opik → Langfuse and updates example commands/IDs.
Comments suppressed due to low confidence (2)

implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/dataset.py:50

  • The list comprehension here is used only for side effects (creating dataset items) and builds an unnecessary list. Prefer a simple for loop so failures can be handled/continued per item and to avoid extra memory usage.
    implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/dataset.py:46
  • register_dataset() always calls client.create_dataset(name=...). If the dataset already exists (e.g., you re-run an experiment with the same split), this will likely raise and prevent runs from starting. Consider making this idempotent by checking for existence first or by catching/ignoring the specific "already exists" error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@aravind-3105 aravind-3105 force-pushed the langchain-modification branch from ffab700 to 335b468 Compare March 19, 2026 15:43
@aravind-3105 aravind-3105 force-pushed the langchain-modification branch from a860acf to 926c088 Compare March 19, 2026 21:45
@aravind-3105
Copy link
Member Author

Added missing instrumentation for the various model providers (GenAI and OpenAI) to capture detailed traces, based on @fcogidi's feedback.

@aravind-3105 aravind-3105 merged commit 74423d6 into main Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants