Skip to content

benchmark(locomo-retrieval-only): add SochDB-backed NVIDIA hybrid baseline#6

Merged
sushanthpy merged 1 commit into
sochdb:mainfrom
tatavishnurao:benchmark/locomo-retrieval-only-sochdb-nvidia
May 30, 2026
Merged

benchmark(locomo-retrieval-only): add SochDB-backed NVIDIA hybrid baseline#6
sushanthpy merged 1 commit into
sochdb:mainfrom
tatavishnurao:benchmark/locomo-retrieval-only-sochdb-nvidia

Conversation

@tatavishnurao
Copy link
Copy Markdown
Contributor

Summary

Adds a LoCoMo retrieval-only benchmark artifact for a SochDB-backed hybrid retrieval pipeline.

This benchmark evaluates evidence recovery, not answer-generation accuracy.

Configuration

  • Dataset: LoCoMo converted QA split
  • Questions: 1,986
  • Memory rows: 5,882 raw conversation turns
  • Sparse retriever: BM25
  • Dense embedding model: nvidia/llama-nemotron-embed-1b-v2
  • Dense vector backend: SochDB gRPC
  • Fusion: RRF
  • k: 20
  • candidate_k: 100
  • bm25_weight: 1.5
  • vector_weight: 0.75

Result

System n Evidence Hit@20 Evidence Recall@20 Avg Context Tokens Avg Latency
BM25 + SochDB vector + NVIDIA Nemotron + RRF 1,986 0.7056 0.6522 657.98 200.81 ms

Category Breakdown

Category n Evidence Hit@20 Evidence Recall@20 Notes
adversarial 446 0.7534 0.7455 Strong
temporal 321 0.7563 0.7232 Strong
open_domain 841 0.7337 0.7210 Strong
single_hop 282 0.5801 0.3271 Needs better exact fact extraction
multi_hop 96 0.4157 0.3058 Weakest; likely needs graph/fact expansion

Scope

  • This is retrieval-only, not end-to-end LoCoMo answer accuracy.
  • BM25 is used externally for sparse lexical retrieval.
  • NVIDIA Nemotron is used externally for dense embeddings.
  • SochDB is used as the dense vector backend through gRPC.
  • Concrete SochDB endpoint addresses and API keys are intentionally omitted.
  • Current latency reflects remote unary gRPC search calls and should not be interpreted as colocated SochDB core search latency.
  • Answer/judge evaluation is intentionally excluded because the current smoke run did not yet construct non-empty context.

Why this matters

This result establishes a stronger SochDB-backed memory retrieval baseline on LoCoMo.

It shows that SochDB can participate as the dense vector backend in a hybrid sparse+dense retrieval pipeline without degrading evidence recovery quality.

Files added

  • reports/locomo_retrieval_only_sochdb_nvidia_k20.md
  • reports/locomo_retrieval_only_sochdb_nvidia_k20.json

Validation

  • JSON artifact validated with uv run python -m json.tool.
  • Sanity assertions passed for benchmark name, mode, dataset counts, top-level result metrics, and category keys.
  • Secret/endpoint leak checks were run against the report artifacts and staged diff.

@sushanthpy sushanthpy merged commit 7cccdec into sochdb:main May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants