benchmark(locomo-retrieval-only): add SochDB-backed NVIDIA hybrid baseline by tatavishnurao · Pull Request #6 · sochdb/sochdb-benchmarks

tatavishnurao · 2026-05-22T23:30:20Z

Summary

Adds a LoCoMo retrieval-only benchmark artifact for a SochDB-backed hybrid retrieval pipeline.

This benchmark evaluates evidence recovery, not answer-generation accuracy.

Configuration

Dataset: LoCoMo converted QA split
Questions: 1,986
Memory rows: 5,882 raw conversation turns
Sparse retriever: BM25
Dense embedding model: nvidia/llama-nemotron-embed-1b-v2
Dense vector backend: SochDB gRPC
Fusion: RRF
k: 20
candidate_k: 100
bm25_weight: 1.5
vector_weight: 0.75

Result

System	n	Evidence Hit@20	Evidence Recall@20	Avg Context Tokens	Avg Latency
BM25 + SochDB vector + NVIDIA Nemotron + RRF	1,986	0.7056	0.6522	657.98	200.81 ms

Category Breakdown

Category	n	Evidence Hit@20	Evidence Recall@20	Notes
adversarial	446	0.7534	0.7455	Strong
temporal	321	0.7563	0.7232	Strong
open_domain	841	0.7337	0.7210	Strong
single_hop	282	0.5801	0.3271	Needs better exact fact extraction
multi_hop	96	0.4157	0.3058	Weakest; likely needs graph/fact expansion

Scope

This is retrieval-only, not end-to-end LoCoMo answer accuracy.
BM25 is used externally for sparse lexical retrieval.
NVIDIA Nemotron is used externally for dense embeddings.
SochDB is used as the dense vector backend through gRPC.
Concrete SochDB endpoint addresses and API keys are intentionally omitted.
Current latency reflects remote unary gRPC search calls and should not be interpreted as colocated SochDB core search latency.
Answer/judge evaluation is intentionally excluded because the current smoke run did not yet construct non-empty context.

Why this matters

This result establishes a stronger SochDB-backed memory retrieval baseline on LoCoMo.

It shows that SochDB can participate as the dense vector backend in a hybrid sparse+dense retrieval pipeline without degrading evidence recovery quality.

Files added

reports/locomo_retrieval_only_sochdb_nvidia_k20.md
reports/locomo_retrieval_only_sochdb_nvidia_k20.json

Validation

JSON artifact validated with uv run python -m json.tool.
Sanity assertions passed for benchmark name, mode, dataset counts, top-level result metrics, and category keys.
Secret/endpoint leak checks were run against the report artifacts and staged diff.

benchmark(locomo-retrieval-only): add SochDB NVIDIA hybrid baseline

fa8ec47

sushanthpy merged commit 7cccdec into sochdb:main May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark(locomo-retrieval-only): add SochDB-backed NVIDIA hybrid baseline#6

benchmark(locomo-retrieval-only): add SochDB-backed NVIDIA hybrid baseline#6
sushanthpy merged 1 commit into
sochdb:mainfrom
tatavishnurao:benchmark/locomo-retrieval-only-sochdb-nvidia

tatavishnurao commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tatavishnurao commented May 22, 2026

Summary

Configuration

Result

Category Breakdown

Scope

Why this matters

Files added

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants