GRPO-Aligned Rewriting (MQR-A1) × Reasoning Retrieval (MRE-T1) — Advancing Reasoning-Intensive Retrieval
MQR-A1 represents a paradigm shift in reasoning-intensive retrieval tasks. Conventional models and agentic pipelines rely heavily on query expansion and are prone to the superficial semantic trap—a scenario where negative samples exhibit high surface-level textual similarity to queries yet are logically mismatched.
The MQR-A1 shifts the retrieval paradigm from simple additive expansion to reinforcement-learning-driven intent distillation:
To overcome the limitations of standard LLM rewriting and conventional SFT (Supervised Fine-Tuning), we have constructed a highly robust engineering pipeline.
To prevent the model from converging on rigid, template-based shortcuts during SFT, we implemented a heterogeneous candidate rewrite mining strategy. For every query, we dynamically synthesized a diverse set of natural-language-structured rewrites. This approach forces the model to prioritize underlying retrieval intent over superficial syntactic patterns, significantly enhancing the robustness of the resulting policy distribution.
Traditional cross-entropy loss is inherently misaligned with retrieval objectives. By injecting natural language structural features derived from the mining stage, we equip the base model with strong discriminative feature extraction capabilities, thereby laying a stable initialization foundation for the subsequent GRPO phase.
Unlike DPO, which relies on static preference data, GRPO enables the model to engage in interactive learning via retrieval feedback within the actual document corpus environment. This allows the model to autonomously explore and extract highly discriminative features, facilitating a fundamental leap from superficial “textual matching” to deep “intent alignment”. By reshaping the policy distribution, GRPO creates steep probability peaks over optimal reasoning paths, ensuring consistent and high-quality retrieval outcomes.
Our Multi-Dimensional Reward Function includes:
Mira-Reasoning-Retrieval (the combined pipeline of MQR-A1 query rewriter and MRE-T1 retriever) achieves state-of-the-art results on the rigorous BRIGHT Benchmark. The system demonstrates categorical dominance by securing the No. 1 position across all evaluated dimensions, including both short and long document tracks. Notably, our framework outperforms both existing query alignment models and complex, resource-heavy agentic pipelines, establishing a new performance frontier in reasoning-intensive retrieval.
1、Single-model Methods on Short Documents
| Model | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MRE-T1 | 55.3 | 56.5 | 32.9 | 48.2 | 33.1 | 34.2 | 37.3 | 35.0 | 35.5 | 16.7 | 43.3 | 46.9 | 39.6 |
| llama-nv-embed-reasoning-3b | 63.4 | 60.2 | 39.5 | 45.5 | 32.6 | 34.0 | 43.3 | 37.5 | 15.0 | 10.5 | 39.5 | 38.5 | 38.3 |
| ReasonEmbed-Qwen3-8B-0928 | 55.5 | 56.6 | 36.2 | 47.4 | 35.3 | 36.6 | 39.1 | 33.6 | 16.4 | 12.5 | 41.4 | 47.2 | 38.2 |
| ReasonEmbed-Qwen3-4B-0928 | 55.4 | 54.5 | 34.9 | 46.9 | 34.0 | 36.1 | 37.4 | 34.5 | 13.6 | 11.3 | 41.4 | 45.1 | 37.1 |
| Seed-1.5-Embedding | 34.8 | 46.9 | 23.4 | 31.6 | 19.1 | 25.4 | 21.0 | 43.2 | 4.9 | 12.2 | 33.3 | 30.5 | 27.2 |
| inf-retriever-v1-pro | 37.8 | 39.7 | 26.2 | 34.4 | 20.1 | 22.6 | 26.3 | 38.3 | 1.9 | 13.6 | 30.5 | 25.5 | 26.4 |
| ReasonIR-8B | 26.2 | 31.4 | 23.3 | 30.0 | 18.0 | 23.9 | 20.5 | 35.0 | 10.5 | 14.7 | 31.9 | 27.2 | 24.4 |
| Qwen3-Embedding-8B | 21.0 | 33.8 | 18.6 | 27.8 | 15.6 | 18.6 | 17.1 | 33.5 | 1.2 | 9.5 | 40.6 | 39.6 | 23.1 |
| bm25 | 18.9 | 27.2 | 14.9 | 12.5 | 13.6 | 18.4 | 15.0 | 24.4 | 7.9 | 6.2 | 10.4 | 4.9 | 14.5 |
| bge-large-en-v1.5 | 1.7 | 24.6 | 16.6 | 17.5 | 11.7 | 10.8 | 13.3 | 26.7 | 5.7 | 6.0 | 13.0 | 6.9 | 13.7 |
2、Retrieval Pipelines on Short Documents
| Model | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mira-Reasoning-Retrieval | 86.7 | 78.5 | 69.7 | 78.2 | 58.4 | 67.0 | 65.9 | 46.8 | 73.4 | 45.2 | 60.6 | 72.3 | 66.9 |
| INF-X-Retriever (inf+retrieve) | 79.8 | 70.9 | 69.9 | 73.3 | 57.7 | 64.3 | 61.9 | 56.1 | 54.5 | 51.9 | 53.1 | 67.9 | 63.4 |
| RakanEmb4B (inf+retrieve) | 65.9 | 56.9 | 59.0 | 60.7 | 49.0 | 52.8 | 53.3 | 35.6 | 55.4 | 22.0 | 53.9 | 64.5 | 52.4 |
| Nemo Retriever's Agentic Retrieval | 72.8 | 66.0 | 48.7 | 59.6 | 52.5 | 47.1 | 50.2 | 49.3 | 42.1 | 21.0 | 53.3 | 48.0 | 50.9 |
| DIVER-v3-GroupRank | 66.0 | 63.7 | 42.4 | 55.0 | 40.6 | 44.7 | 50.4 | 32.5 | 47.3 | 17.2 | 46.4 | 55.6 | 46.8 |
| BGE-Reasoner-0928 | 68.5 | 66.4 | 40.6 | 53.1 | 43.2 | 44.1 | 47.8 | 29.0 | 41.6 | 17.2 | 46.5 | 58.4 | 46.4 |
| Lattice Hierarchical Retrieval | 64.4 | 62.4 | 45.4 | 57.4 | 47.6 | 37.6 | 46.4 | 19.9 | 34.0 | 12.0 | 30.1 | 47.8 | 42.1 |
| ReasonRank (rerank RaDer) | 62.7 | 55.5 | 36.7 | 54.6 | 35.7 | 38.0 | 44.8 | 29.5 | 25.6 | 14.4 | 42.0 | 50.1 | 40.8 |
| XRR2 | 63.1 | 55.4 | 38.5 | 52.9 | 37.1 | 38.2 | 44.6 | 21.9 | 35.0 | 15.7 | 34.4 | 46.2 | 40.3 |
| RaDeR with Qwen reranking | 58.0 | 59.2 | 33.0 | 49.4 | 31.8 | 39.0 | 36.4 | 33.5 | 33.3 | 10.8 | 34.2 | 51.6 | 39.2 |
| ReasonIR with Rank-R1 | 59.5 | 55.1 | 37.9 | 52.7 | 30.0 | 39.3 | 45.1 | 32.1 | 17.1 | 10.7 | 40.4 | 45.6 | 38.8 |
3、Single-model Methods on Long Documents
| Model | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Pony | Avg |
|---|---|---|---|---|---|---|---|---|---|
| MRE-T1 | 46.5 | 46 | 34.5 | 52.7 | 27.7 | 22.2 | 45.2 | 6.3 | 35.1 |
| Google-Gecko-Text_Embedding-004 | - | - | - | - | - | - | - | - | 33.2 |
| inf-retriever-v1-pro | 44.1 | 42.2 | 31.4 | 43.1 | 20.8 | 21.4 | 41.0 | 0.4 | 30.5 |
| gte-Qwen1.5-7B-instruct | 39.2 | 36.1 | 25.7 | 42.3 | 21.3 | 23.5 | 33.1 | 1.3 | 27.8 |
| SFR-Embedding-Mistral | 30.3 | 37.0 | 24.3 | 47.7 | 17.3 | 14.5 | 35.0 | 2.0 | 26.0 |
| GritLM-7B | 37.5 | 40.3 | 25.7 | 34.4 | 17.8 | 20.1 | 32.4 | 0.0 | 26.0 |
| e5-mistral-7b-instruct | 29.9 | 36.3 | 26.2 | 46.7 | 17.3 | 14.5 | 32.2 | 1.1 | 25.5 |
| voyage-large-2-instruct | 34.4 | 35.4 | 26.7 | 41.6 | 12.9 | 12.8 | 31.1 | 1.3 | 24.5 |
| (Google) text-embedding-preview0409 | 30.9 | 38.0 | 21.9 | 30.7 | 12.9 | 19.2 | 25.7 | 0.3 | 22.4 |
| (OpenAI) text-embedding-3-large | 32.1 | 31.4 | 23.8 | 34.2 | 11.9 | 10.7 | 26.3 | 0.0 | 21.3 |
| Cohere-embed-english-v3.0 | 31.5 | 34.5 | 18.9 | 20.5 | 9.9 | 15.8 | 15.2 | 0.8 | 18.4 |
| SBERT | 25.6 | 34.1 | 18.9 | 15.8 | 10.9 | 15 | 18 | 1.2 | 17.4 |
| bge-large-en-v1.5 | 16.4 | 27.7 | 20.9 | 11.6 | 10.9 | 13.3 | 16.9 | 0.4 | 14.8 |
| BM25 | 10.7 | 15.4 | 10.7 | 8.4 | 7.4 | 22.2 | 10.7 | 5.4 | 11.4 |
4、Retrieval Pipelines on Long Documents
| Model | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Pony | Avg |
|---|---|---|---|---|---|---|---|---|---|
| Mira-Reasoning-Retrieval | 77.1 | 59.0 | 71.2 | 73.8 | 46.0 | 35.5 | 70.6 | 14.6 | 56.0 |
| INF-X-Retriever (inf+retrieve) | 73.2 | 59.6 | 69.3 | 74.3 | 55.9 | 27.8 | 64.8 | 12.0 | 54.6 |