Skip to content

langsmith#1413

Open
kcoopermiller wants to merge 5 commits into
mainfrom
cooper/langsmith
Open

langsmith#1413
kcoopermiller wants to merge 5 commits into
mainfrom
cooper/langsmith

Conversation

@kcoopermiller
Copy link
Copy Markdown
Member

@kcoopermiller kcoopermiller commented May 19, 2026

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Note

Add LangSmith tracing and navigation tool call logging to the Wikispeedia environment

  • Passes a LangSmith invoke config (stable run_id derived from trajectory_id, run_name, metadata, tags) into agent.ainvoke and stores the run ID on state.
  • Records each click_link and go_back invocation (name + validity) into state['navigation_tool_calls'] via a new record_navigation_tool_call helper.
  • Reworks count_tool_calls and invalid_link_rate to prefer the state log over completion transcripts, preventing double-counting and enabling metrics when completion is empty.
  • Adds a LANGSMITH_API_KEY check at environment load time when LANGSMITH_TRACING=true.
  • Bumps the verifiers dependency to >=0.1.15.dev7 in pyproject.toml.

Macroscope summarized 0f58011.


Note

Medium Risk
Medium risk because it changes how the harness invokes Deep Agents (always passing a LangGraph/LangSmith config with derived run_id) and alters metric computation by introducing state-based navigation tool logging, which can affect evaluation outputs.

Overview
Adds LangSmith/LangGraph tracing support to the Wikispeedia Deep Agents harness by passing a structured ainvoke config (stable run_id derived from trajectory_id, run_name, thread_id, metadata, tags, and optional recursion_limit) and storing state['langsmith_run_id']; when LANGSMITH_TRACING=true, the environment now requires LANGSMITH_API_KEY.

Introduces explicit navigation tool-call logging (navigation_tool_calls) for click_link/go_back and updates total_tool_calls and invalid_link_rate to prefer this log (and to avoid double-counting nav tools from completion when present), with new/updated tests and README guidance; bumps the env’s verifiers dependency.

Reviewed by Cursor Bugbot for commit 0f58011. Bugbot is set up for automated code reviews on this repo. Configure here.

@kcoopermiller kcoopermiller marked this pull request as ready for review May 19, 2026 20:32
macroscopeapp[bot]
macroscopeapp Bot previously approved these changes May 19, 2026
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 19, 2026

Approvability

Verdict: Needs human review

This PR adds LangSmith tracing integration and introduces a new tool call tracking mechanism that changes how metrics are calculated. The new runtime behavior and feature addition warrant human review.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant