Integrate Langfuse observability into VQA evaluation and update tracing replacing Opik as default#34
Merged
aravind-3105 merged 9 commits intomainfrom Mar 19, 2026
Merged
Conversation
…ng Opik references and updating tracing mechanisms
…e Langfuse integration, and tidy up code formatting
There was a problem hiding this comment.
Pull request overview
This PR migrates the Agentic ChartQAPro VQA evaluation framework’s observability layer from Opik to Langfuse, updating tracing, score logging, retroactive ingestion, and documentation/notebooks to reflect the new integration.
Changes:
- Replaced Opik tracing + clients with Langfuse equivalents across pipeline runners, agents, tools, and evaluation passes.
- Added Langfuse integration modules (client, tracing wrappers, ingestion) and removed the Opik integration package.
- Updated docs and notebooks to reference Langfuse setup, environment variables, and
lf_trace_idin MEPs.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Removes Opik and its transitive deps; locks Langfuse and related deps. |
pyproject.toml |
Removes Opik from the agentic-xai-eval dependency group. |
implementations/mechanistic_interpretability/README.md |
Trims trailing whitespace. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/tools/vision_qa_tool.py |
Switches tracing import + tool trace handle field to Langfuse. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/tools/ocr_reader_tool.py |
Switches tracing import + tool trace handle field to Langfuse. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/runner/run_generate_meps.py |
Uses Langfuse client/dataset/prompts/tracing; records lf_trace_id; updates trace output. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/tracing.py |
Deleted (replaced by Langfuse tracing wrappers). |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/ingest.py |
Deleted (replaced by Langfuse ingestion). |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/client.py |
Deleted (replaced by Langfuse client). |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/opik_integration/__init__.py |
Deleted (package removed). |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/mep/schema.py |
Renames opik_trace_id to lf_trace_id in the MEP schema. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/tracing.py |
Adds Langfuse v4 tracing wrappers (sample_trace, open_llm_span, close_span, scoring). |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/prompts.py |
Updates prompt management from Opik to Langfuse APIs + CLI instructions. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/ingest.py |
Adds retroactive ingestion from MEP JSON into Langfuse traces + scoring. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/dataset.py |
Updates dataset registration to Langfuse dataset/items APIs. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/client.py |
Adds Langfuse client singleton keyed off LANGFUSE_PUBLIC_KEY/SECRET_KEY. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/__init__.py |
Adds Langfuse integration package marker. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/eval_topk.py |
Logs top-k hit metrics back to Langfuse trace scores. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/eval_outputs.py |
Logs accuracy/judge/latency scores back to Langfuse trace scores. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/error_taxonomy.py |
Logs taxonomy failure category scores back to Langfuse trace scores. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/eval/dashboard.py |
Updates default MEP directory path in Streamlit UI. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/agents/vision_agent.py |
Renames trace parameter to lf_trace and swaps tracing imports to Langfuse. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/agents/verifier_agent.py |
Renames trace parameter to lf_trace and swaps tracing imports to Langfuse. |
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/agents/planner_agent.py |
Renames trace parameter to lf_trace and swaps tracing imports to Langfuse. |
implementations/agentic_vqa_eval/run_pipeline.ipynb |
Updates “health check” and pipeline cells to use Langfuse client + env vars + lf_trace_id. |
implementations/agentic_vqa_eval/analysis.ipynb |
Makes prerequisite commands robust to execution from any repo directory; fixes empty-failure edge case. |
implementations/agentic_vqa_eval/README.md |
Rewrites observability docs from Opik → Langfuse and updates example commands/IDs. |
Comments suppressed due to low confidence (2)
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/dataset.py:50
- The list comprehension here is used only for side effects (creating dataset items) and builds an unnecessary list. Prefer a simple
forloop so failures can be handled/continued per item and to avoid extra memory usage.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/dataset.py:46 register_dataset()always callsclient.create_dataset(name=...). If the dataset already exists (e.g., you re-run an experiment with the same split), this will likely raise and prevent runs from starting. Consider making this idempotent by checking for existence first or by catching/ignoring the specific "already exists" error.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/tools/ocr_reader_tool.py
Outdated
Show resolved
Hide resolved
implementations/agentic_vqa_eval/src/agentic_chartqapro_eval/langfuse_integration/tracing.py
Show resolved
Hide resolved
…nd update tracing parameters for observability.
… missing attributes propagation
ffab700 to
335b468
Compare
a860acf to
926c088
Compare
Member
Author
|
Added missing instrumentation for the various model providers (GenAI and OpenAI) to capture detailed traces, based on @fcogidi's feedback. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request migrates the observability and tracing integration in the Agentic VQA Evaluation framework from Opik to Langfuse. All documentation, code references, environment variables, and integration logic have been updated to use Langfuse as the LLM observability platform, replacing Opik. The migration is comprehensive, covering user instructions, dependency management, file structure, and all agent classes.
Clickup Ticket(s): Link(s) if applicable.
Type of Change
Changes Made
Migration from Opik to Langfuse:
README.md) have been replaced with Langfuse, including setup instructions, feature descriptions, and environment variables.opik_integrationtolangfuse_integration, and all code references in agent classes (planner_agent.py,vision_agent.py,verifier_agent.py) have been updated accordingly.opik_tracetolf_traceto reflect the new integration, ensuring all tracing and span operations now use Langfuse.LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYnow control observability, replacing the previous Opik variables. Documentation and code reflect that Langfuse is optional and the framework runs unchanged if these variables are not set.lf_trace_idinstead ofopik_trace_id.Dependency and setup updates:
agentic-xai-evaltoref6-agentic-xai-evalin the setup instructions.langfuseinstead ofopik.These changes ensure the framework is fully integrated with Langfuse for LLM observability, providing a more modern and flexible tracing and experiment comparison experience.
Testing
uv run pytest tests/)uv run mypy <src_dir>)uv run ruff check src_dir/)Manual testing details:
Screenshots/Recordings
Related Issues
Deployment Notes
Checklist