Skip to content

Latest commit

Β 

History

History
90 lines (60 loc) Β· 4.22 KB

File metadata and controls

90 lines (60 loc) Β· 4.22 KB

πŸ”¬ Open Deep Research - Enhanced with Gensee Search

Fork Notice: This is an enhanced version of LangChain's Open Deep Research, integrated with Gensee Search for improved search capabilities and reasoning. Check LangChain's repo to learn more about how it's built.

πŸš€ What's Different in This Version

  • πŸ” Gensee Search Integration: Replaced Tavily with Gensee Search for enhanced search quality and AI application optimization
  • 🧠 Improved Reasoning: Enhanced agent prompts to encourage more thorough search and reasoning processes
  • πŸ› οΈ Easy Integration: Demonstrates simple integration of Gensee's testing and optimization tools for GenAI applications

Learn more about Gensee's AI testing and optimization platform at gensee.ai

πŸš€ Quickstart

  1. Clone the repository and activate a virtual environment:
git clone https://github.com/GenseeAI/open_deep_research.git
cd open_deep_research
uv venv --python=3.12
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
uv sync
# or
uv pip install -r pyproject.toml
  1. Set up your .env file to customize the environment variables (for model selection, search tools, and other configuration settings):

Get FREE access to Gensee Search API from https://airesearch.gensee.ai/

cp .env.example .env
# GENSEE_API_KEY=your_api_key_here
  1. Launch agent with the LangGraph server locally:
# Install dependencies and start the LangGraph server
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking

This will open the LangGraph Studio UI in your browser.

- πŸš€ API: http://127.0.0.1:2024
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
- πŸ“š API Docs: http://127.0.0.1:2024/docs

Ask a question in the messages input field and click Submit. Select different configuration in the "Manage Assistants" tab.

βš™οΈ Configurations

See the fields in the run_evaluate.py to config the model usage and other agent behaviors.

πŸ“Š Evaluation

Open Deep Research is configured for evaluation with Deep Research Bench. This benchmark has 100 PhD-level research tasks (50 English, 50 Chinese), crafted by domain experts across 22 fields (e.g., Science & Tech, Business & Finance) to mirror real-world deep-research needs. It has 2 evaluation metrics, but the leaderboard is based on the RACE score. This uses LLM-as-a-judge (Gemini) to evaluate research reports against a golden set of reports compiled by experts across a set of metrics.

Usage

Warning: Running across the 100 examples can cost ~$20-$100 depending on the model selection.

# Run comprehensive evaluation on LangSmith datasets
python tests/run_evaluate.py

This will provide a link to a LangSmith experiment, which will have a name YOUR_EXPERIMENT_NAME. Once this is done, extract the results to a JSONL file that can be submitted to the Deep Research Bench.

python tests/extract_langsmith_data.py --project-name "YOUR_EXPERIMENT_NAME" --model-name "you-model-name" --dataset-name "deep_research_bench"

This creates tests/expt_results/deep_research_bench_model-name.jsonl with the required format. Move the generated JSONL file to a local clone of the Deep Research Bench repository and follow their Quick Start guide for evaluation submission.

Results

Name Summarization Research Compression Total Cost Total Tokens RACE Score
Gensee Search openai:gpt-4.1-mini openai:gpt-5 openai:gpt-4.1 $158.56 165,689,034 0.5079
LangChain GPT-5 openai:gpt-4.1-mini openai:gpt-5 openai:gpt-4.1 204,640,896 0.4943
LangChain Submission openai:gpt-4.1-nano openai:gpt-4.1 openai:gpt-4.1 $87.83 207,005,549 0.4344