Skip to content

Commit 9d5ae8e

Browse files
authored
Merge pull request #1528 from hanhainebula/master
Update BGE-Reasoner: Release v0923 embedder
2 parents c6e49d7 + 1ee2058 commit 9d5ae8e

2 files changed

Lines changed: 39 additions & 5 deletions

File tree

research/BGE_Reasoner/README.md

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
We introduce **BGE-Reasoner**, an end-to-end reasoning-intensive information retrieval framework. BGE-Reasoner is characterized by three key features:
88

99
1. **End-to-end**: It comprises three core components in IR—**BGE-Reasoner-Rewriter**, **BGE-Reasoner-Embed**, and **BGE-Reasoner-Reranker**—covering the entire retrieval pipeline, from query rewriting and retrieval to reranking for reasoning-intensive tasks.
10-
2. **Excellent performance**: **BGE-Reasoner** achieves **state-of-the-art (SOTA)** performance on [BRIGHT](https://brightbenchmark.github.io/), a reasoning-intensive information retrieval benchmark, with an **nDCG@10 of 45.2** across 12 datasets, outperforming the previous SOTA by +3.6 points (41.6 from [DIVER](https://arxiv.org/pdf/2508.07995), Aug 12, 2025).
10+
2. **Excellent performance**: **BGE-Reasoner** achieves **state-of-the-art (SOTA)** performance on [BRIGHT](https://brightbenchmark.github.io/), a reasoning-intensive information retrieval benchmark, with an **nDCG@10 of 45.2** across 12 datasets (released on Aug 21, 2025), outperforming the previous SOTA by +3.6 points (41.6 from [DIVER](https://arxiv.org/pdf/2508.07995), Aug 12, 2025).
1111
3. **Open-source resources**: We will release the code, model checkpoints, training data, and evaluation scripts to facilitate future research on reasoning-intensive information retrieval. Please stay tuned!
1212

1313

@@ -17,8 +17,9 @@ We introduce **BGE-Reasoner**, an end-to-end reasoning-intensive information ret
1717
| ------------------ | --------------------- | ----------- | ------------------ | ------------------ |
1818
| Model | BGE-Reasoner-Rewriter | [🤗]() (TBA) | - | |
1919
| Model | BGE-Reasoner-Reranker | [🤗]() (TBA) | - | |
20-
| Model | BGE-Reasoner-Embed | [🤗]() (TBA) | - | |
21-
| Search Results | BGE-Reasoner-Embed-0821 Search Results | [🤗](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main) | Sep 4, 2025 | nDCG@10 = 32.5, submission to BRIGHT leaderboard on Aug 21, 2025 |
20+
| Model | BGE-Reasoner-Embed-Qwen3-8B-0923 | [🤗](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923) | Sep 23, 2025 | nDCG@10 = 37.2 using original query, fine-tuned on [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) with our latest refined training data (data to be released) |
21+
| Search Results | BGE-Reasoner-Embed-Qwen3-8B-0923 Search Results | [🤗](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/search_results) | Sep 23, 2025 | nDCG@10 = 37.2 using original query |
22+
| Search Results | BGE-Reasoner-Embed-0821 Search Results | [🤗](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main) | Sep 4, 2025 | nDCG@10 = 32.5 using original query, submission to BRIGHT leaderboard on Aug 21, 2025 |
2223
| Training Data | BGE-Reasoner-Data | [🤗](https://huggingface.co/datasets/hanhainebula/bge-reasoner-data/tree/main/bge-reasoner-data-0904) | Sep 4, 2025 | part of our training data; full data to be released in the future |
2324
| Evaluation Scripts | - | (TBA) | - | |
2425

@@ -67,13 +68,46 @@ Note:
6768
### Embedder & Rewriter Results
6869

6970

70-
**BGE-Reasoner-Embed-0821**, submitted to the BRIGHT leaderboard on Aug 21, 2025, also achieves excellent performance on the benchmark:
71+
#### BGE-Reasoner-Embed-Qwen3-8B-0923
72+
73+
**BGE-Reasoner-Embed-Qwen3-8B-0923**, fine-tuned on [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) with our latest refined training data (data to be released), achieves strong performance on the BRIGHT benchmark:
74+
75+
- With original queries, it attains **nDCG@10 = 37.2**, an absolute improvement of **+8.3** over the previous best ([DIVER](https://arxiv.org/pdf/2508.07995): 28.9).
76+
- Using the GPT-4 reasoning queries provided by BRIGHT, the score increases to **39.7**, which is **+7.6** higher than DIVER’s corresponding result (32.1).
77+
78+
> On Sep 23, 2025, we released the first-stage search results of BGE-Reasoner-Embed-Qwen3-8B-0923 using original queries and GPT-4 reasoning queries (Top-2000 candidates; excluded IDs removed) [here](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/search_results). The model checkpoint is available [here](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923).
79+
80+
![BGE-Reasoner-Embed-Qwen3-8B-0923 Results](./imgs/embedder-0923_results.png)
81+
82+
Note:
83+
- "**Avg - ALL**" refers to the average performance across **all 12 datasets** in the BRIGHT benchmark.
84+
- "**Avg - SE**" refers to the average performance across the **7 datasets in the StackExchange subset** of the BRIGHT benchmark.
85+
- "**Avg - CD**" refers to the average performance across the **2 datasets in the Coding subset** of the BRIGHT benchmark.
86+
- "**Avg - MT**" refers to the average performance across the **3 datasets in the Theorem-based subset** of the BRIGHT benchmark.
87+
88+
> Sources of Results:
89+
>
90+
> [1] https://arxiv.org/pdf/2407.12883
91+
>
92+
> [2] https://arxiv.org/pdf/2504.20595
93+
>
94+
> [3] https://github.com/Debrup-61/RaDeR
95+
>
96+
> [4] https://seed1-5-embedding.github.io
97+
>
98+
> [5] https://arxiv.org/pdf/2508.07995
99+
>
100+
> *: results evaluated with our script
101+
102+
#### BGE-Reasoner-Embed-0821
103+
104+
**BGE-Reasoner-Embed-0821**, submitted to the BRIGHT leaderboard on Aug 21, 2025, achieves excellent performance on the benchmark:
71105

72106
- With original queries, it attains **nDCG@10 = 32.5**, an absolute improvement of **+3.6** over the previous best ([DIVER](https://arxiv.org/pdf/2508.07995): 28.9).
73107
- Using the GPT-4 reasoning queries provided by BRIGHT, the score increases to **37.7**, which is **+5.6** higher than DIVER’s corresponding result (32.1). Combining our embedding-based retrieval with BM25 (hybrid fusion, weights: 0.75 / 0.25) yields **nDCG@10 = 40.2**.
74108
- Finally, when using rewritten queries produced by **BGE-Reasoner-Rewriter** and fusing with BM25 (weights: 0.75 / 0.25), we reach **nDCG@10 = 40.8**.
75109

76-
> On Sep 4, 2025, we released the first-stage search results of BGE-Reasoner-Embed-0821 using original queries and GPT-4 reasoning queries (Top-2000 candidates; excluded IDs removed) [here](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main).
110+
> On Sep 4, 2025, we released the first-stage search results of BGE-Reasoner-Embed-0821 using original queries and GPT-4 reasoning queries (Top-2000 candidates; excluded IDs removed) [here](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main). The model checkpoint will not be released due to its suboptimal performance compared to BGE-Reasoner-Embed-Qwen3-8B-0923.
77111
78112

79113
![BGE-Reasoner-Embed & BGE-Reasoner-Rewriter Results](./imgs/embedder-rewriter_results.png)
123 KB
Loading

0 commit comments

Comments
 (0)