You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: research/BGE_Reasoner/README.md
+39-5Lines changed: 39 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
We introduce **BGE-Reasoner**, an end-to-end reasoning-intensive information retrieval framework. BGE-Reasoner is characterized by three key features:
8
8
9
9
1.**End-to-end**: It comprises three core components in IR—**BGE-Reasoner-Rewriter**, **BGE-Reasoner-Embed**, and **BGE-Reasoner-Reranker**—covering the entire retrieval pipeline, from query rewriting and retrieval to reranking for reasoning-intensive tasks.
10
-
2.**Excellent performance**: **BGE-Reasoner** achieves **state-of-the-art (SOTA)** performance on [BRIGHT](https://brightbenchmark.github.io/), a reasoning-intensive information retrieval benchmark, with an **nDCG@10 of 45.2** across 12 datasets, outperforming the previous SOTA by +3.6 points (41.6 from [DIVER](https://arxiv.org/pdf/2508.07995), Aug 12, 2025).
10
+
2.**Excellent performance**: **BGE-Reasoner** achieves **state-of-the-art (SOTA)** performance on [BRIGHT](https://brightbenchmark.github.io/), a reasoning-intensive information retrieval benchmark, with an **nDCG@10 of 45.2** across 12 datasets (released on Aug 21, 2025), outperforming the previous SOTA by +3.6 points (41.6 from [DIVER](https://arxiv.org/pdf/2508.07995), Aug 12, 2025).
11
11
3.**Open-source resources**: We will release the code, model checkpoints, training data, and evaluation scripts to facilitate future research on reasoning-intensive information retrieval. Please stay tuned!
12
12
13
13
@@ -17,8 +17,9 @@ We introduce **BGE-Reasoner**, an end-to-end reasoning-intensive information ret
| Model | BGE-Reasoner-Rewriter |[🤗]() (TBA) | - ||
19
19
| Model | BGE-Reasoner-Reranker |[🤗]() (TBA) | - ||
20
-
| Model | BGE-Reasoner-Embed |[🤗]() (TBA) | - ||
21
-
| Search Results | BGE-Reasoner-Embed-0821 Search Results |[🤗](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main)| Sep 4, 2025 | nDCG@10 = 32.5, submission to BRIGHT leaderboard on Aug 21, 2025 |
20
+
| Model | BGE-Reasoner-Embed-Qwen3-8B-0923 |[🤗](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923)| Sep 23, 2025 | nDCG@10 = 37.2 using original query, fine-tuned on [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) with our latest refined training data (data to be released) |
21
+
| Search Results | BGE-Reasoner-Embed-Qwen3-8B-0923 Search Results |[🤗](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/search_results)| Sep 23, 2025 | nDCG@10 = 37.2 using original query |
22
+
| Search Results | BGE-Reasoner-Embed-0821 Search Results |[🤗](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main)| Sep 4, 2025 | nDCG@10 = 32.5 using original query, submission to BRIGHT leaderboard on Aug 21, 2025 |
22
23
| Training Data | BGE-Reasoner-Data |[🤗](https://huggingface.co/datasets/hanhainebula/bge-reasoner-data/tree/main/bge-reasoner-data-0904)| Sep 4, 2025 | part of our training data; full data to be released in the future |
23
24
| Evaluation Scripts | - | (TBA) | - ||
24
25
@@ -67,13 +68,46 @@ Note:
67
68
### Embedder & Rewriter Results
68
69
69
70
70
-
**BGE-Reasoner-Embed-0821**, submitted to the BRIGHT leaderboard on Aug 21, 2025, also achieves excellent performance on the benchmark:
71
+
#### BGE-Reasoner-Embed-Qwen3-8B-0923
72
+
73
+
**BGE-Reasoner-Embed-Qwen3-8B-0923**, fine-tuned on [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) with our latest refined training data (data to be released), achieves strong performance on the BRIGHT benchmark:
74
+
75
+
- With original queries, it attains **nDCG@10 = 37.2**, an absolute improvement of **+8.3** over the previous best ([DIVER](https://arxiv.org/pdf/2508.07995): 28.9).
76
+
- Using the GPT-4 reasoning queries provided by BRIGHT, the score increases to **39.7**, which is **+7.6** higher than DIVER’s corresponding result (32.1).
77
+
78
+
> On Sep 23, 2025, we released the first-stage search results of BGE-Reasoner-Embed-Qwen3-8B-0923 using original queries and GPT-4 reasoning queries (Top-2000 candidates; excluded IDs removed) [here](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/search_results). The model checkpoint is available [here](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923).
- "**Avg - ALL**" refers to the average performance across **all 12 datasets** in the BRIGHT benchmark.
84
+
- "**Avg - SE**" refers to the average performance across the **7 datasets in the StackExchange subset** of the BRIGHT benchmark.
85
+
- "**Avg - CD**" refers to the average performance across the **2 datasets in the Coding subset** of the BRIGHT benchmark.
86
+
- "**Avg - MT**" refers to the average performance across the **3 datasets in the Theorem-based subset** of the BRIGHT benchmark.
87
+
88
+
> Sources of Results:
89
+
>
90
+
> [1]https://arxiv.org/pdf/2407.12883
91
+
>
92
+
> [2]https://arxiv.org/pdf/2504.20595
93
+
>
94
+
> [3]https://github.com/Debrup-61/RaDeR
95
+
>
96
+
> [4]https://seed1-5-embedding.github.io
97
+
>
98
+
> [5]https://arxiv.org/pdf/2508.07995
99
+
>
100
+
> *: results evaluated with our script
101
+
102
+
#### BGE-Reasoner-Embed-0821
103
+
104
+
**BGE-Reasoner-Embed-0821**, submitted to the BRIGHT leaderboard on Aug 21, 2025, achieves excellent performance on the benchmark:
71
105
72
106
- With original queries, it attains **nDCG@10 = 32.5**, an absolute improvement of **+3.6** over the previous best ([DIVER](https://arxiv.org/pdf/2508.07995): 28.9).
73
107
- Using the GPT-4 reasoning queries provided by BRIGHT, the score increases to **37.7**, which is **+5.6** higher than DIVER’s corresponding result (32.1). Combining our embedding-based retrieval with BM25 (hybrid fusion, weights: 0.75 / 0.25) yields **nDCG@10 = 40.2**.
74
108
- Finally, when using rewritten queries produced by **BGE-Reasoner-Rewriter** and fusing with BM25 (weights: 0.75 / 0.25), we reach **nDCG@10 = 40.8**.
75
109
76
-
> On Sep 4, 2025, we released the first-stage search results of BGE-Reasoner-Embed-0821 using original queries and GPT-4 reasoning queries (Top-2000 candidates; excluded IDs removed) [here](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main).
110
+
> On Sep 4, 2025, we released the first-stage search results of BGE-Reasoner-Embed-0821 using original queries and GPT-4 reasoning queries (Top-2000 candidates; excluded IDs removed) [here](https://huggingface.co/datasets/hanhainebula/bright-search-results_bge-reasoner-embed-0821/tree/main). The model checkpoint will not be released due to its suboptimal performance compared to BGE-Reasoner-Embed-Qwen3-8B-0923.
0 commit comments