mira-reasoning-retrieval

⚡ Mira-Reasoning-Retrieval

GRPO-Aligned Rewriting (MQR-A1) × Reasoning Retrieval (MRE-T1) — Advancing Reasoning-Intensive Retrieval

Rank Retriever Rewriter Company Pipeline Training

📖 Overview

MQR-A1 represents a paradigm shift in reasoning-intensive retrieval tasks. Conventional models and agentic pipelines rely heavily on query expansion and are prone to the superficial semantic trap—a scenario where negative samples exhibit high surface-level textual similarity to queries yet are logically mismatched.


🚀 Highlights

The MQR-A1 shifts the retrieval paradigm from simple additive expansion to reinforcement-learning-driven intent distillation:


🛠️ Methodology & Architecture

To overcome the limitations of standard LLM rewriting and conventional SFT (Supervised Fine-Tuning), we have constructed a highly robust engineering pipeline.

1. Candidate Rewrite Mining

To prevent the model from converging on rigid, template-based shortcuts during SFT, we implemented a heterogeneous candidate rewrite mining strategy. For every query, we dynamically synthesized a diverse set of natural-language-structured rewrites. This approach forces the model to prioritize underlying retrieval intent over superficial syntactic patterns, significantly enhancing the robustness of the resulting policy distribution.

2. Cold Start

Traditional cross-entropy loss is inherently misaligned with retrieval objectives. By injecting natural language structural features derived from the mining stage, we equip the base model with strong discriminative feature extraction capabilities, thereby laying a stable initialization foundation for the subsequent GRPO phase.

3. GRPO-Driven Intent Alignment

Unlike DPO, which relies on static preference data, GRPO enables the model to engage in interactive learning via retrieval feedback within the actual document corpus environment. This allows the model to autonomously explore and extract highly discriminative features, facilitating a fundamental leap from superficial “textual matching” to deep “intent alignment”. By reshaping the policy distribution, GRPO creates steep probability peaks over optimal reasoning paths, ensuring consistent and high-quality retrieval outcomes.

Our Multi-Dimensional Reward Function includes:


🏆 Performance

Mira-Reasoning-Retrieval (the combined pipeline of MQR-A1 query rewriter and MRE-T1 retriever) achieves state-of-the-art results on the rigorous BRIGHT Benchmark. The system demonstrates categorical dominance by securing the No. 1 position across all evaluated dimensions, including both short and long document tracks. Notably, our framework outperforms both existing query alignment models and complex, resource-heavy agentic pipelines, establishing a new performance frontier in reasoning-intensive retrieval.

1、Single-model Methods on Short Documents

Model Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT. Avg
MRE-T1 55.3 56.5 32.9 48.2 33.1 34.2 37.3 35.0 35.5 16.7 43.3 46.9 39.6
llama-nv-embed-reasoning-3b 63.4 60.2 39.5 45.5 32.6 34.0 43.3 37.5 15.0 10.5 39.5 38.5 38.3
ReasonEmbed-Qwen3-8B-0928 55.5 56.6 36.2 47.4 35.3 36.6 39.1 33.6 16.4 12.5 41.4 47.2 38.2
ReasonEmbed-Qwen3-4B-0928 55.4 54.5 34.9 46.9 34.0 36.1 37.4 34.5 13.6 11.3 41.4 45.1 37.1
Seed-1.5-Embedding 34.8 46.9 23.4 31.6 19.1 25.4 21.0 43.2 4.9 12.2 33.3 30.5 27.2
inf-retriever-v1-pro 37.8 39.7 26.2 34.4 20.1 22.6 26.3 38.3 1.9 13.6 30.5 25.5 26.4
ReasonIR-8B 26.2 31.4 23.3 30.0 18.0 23.9 20.5 35.0 10.5 14.7 31.9 27.2 24.4
Qwen3-Embedding-8B 21.0 33.8 18.6 27.8 15.6 18.6 17.1 33.5 1.2 9.5 40.6 39.6 23.1
bm25 18.9 27.2 14.9 12.5 13.6 18.4 15.0 24.4 7.9 6.2 10.4 4.9 14.5
bge-large-en-v1.5 1.7 24.6 16.6 17.5 11.7 10.8 13.3 26.7 5.7 6.0 13.0 6.9 13.7

2、Retrieval Pipelines on Short Documents

Model Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT. Avg
Mira-Reasoning-Retrieval 86.7 78.5 69.7 78.2 58.4 67.0 65.9 46.8 73.4 45.2 60.6 72.3 66.9
INF-X-Retriever (inf+retrieve) 79.8 70.9 69.9 73.3 57.7 64.3 61.9 56.1 54.5 51.9 53.1 67.9 63.4
RakanEmb4B (inf+retrieve) 65.9 56.9 59.0 60.7 49.0 52.8 53.3 35.6 55.4 22.0 53.9 64.5 52.4
Nemo Retriever's Agentic Retrieval 72.8 66.0 48.7 59.6 52.5 47.1 50.2 49.3 42.1 21.0 53.3 48.0 50.9
DIVER-v3-GroupRank 66.0 63.7 42.4 55.0 40.6 44.7 50.4 32.5 47.3 17.2 46.4 55.6 46.8
BGE-Reasoner-0928 68.5 66.4 40.6 53.1 43.2 44.1 47.8 29.0 41.6 17.2 46.5 58.4 46.4
Lattice Hierarchical Retrieval 64.4 62.4 45.4 57.4 47.6 37.6 46.4 19.9 34.0 12.0 30.1 47.8 42.1
ReasonRank (rerank RaDer) 62.7 55.5 36.7 54.6 35.7 38.0 44.8 29.5 25.6 14.4 42.0 50.1 40.8
XRR2 63.1 55.4 38.5 52.9 37.1 38.2 44.6 21.9 35.0 15.7 34.4 46.2 40.3
RaDeR with Qwen reranking 58.0 59.2 33.0 49.4 31.8 39.0 36.4 33.5 33.3 10.8 34.2 51.6 39.2
ReasonIR with Rank-R1 59.5 55.1 37.9 52.7 30.0 39.3 45.1 32.1 17.1 10.7 40.4 45.6 38.8

3、Single-model Methods on Long Documents

Model Bio. Earth. Econ. Psy. Rob. Stack. Sus. Pony Avg
MRE-T1 46.5 46 34.5 52.7 27.7 22.2 45.2 6.3 35.1
Google-Gecko-Text_Embedding-004 - - - - - - - - 33.2
inf-retriever-v1-pro 44.1 42.2 31.4 43.1 20.8 21.4 41.0 0.4 30.5
gte-Qwen1.5-7B-instruct 39.2 36.1 25.7 42.3 21.3 23.5 33.1 1.3 27.8
SFR-Embedding-Mistral 30.3 37.0 24.3 47.7 17.3 14.5 35.0 2.0 26.0
GritLM-7B 37.5 40.3 25.7 34.4 17.8 20.1 32.4 0.0 26.0
e5-mistral-7b-instruct 29.9 36.3 26.2 46.7 17.3 14.5 32.2 1.1 25.5
voyage-large-2-instruct 34.4 35.4 26.7 41.6 12.9 12.8 31.1 1.3 24.5
(Google) text-embedding-preview0409 30.9 38.0 21.9 30.7 12.9 19.2 25.7 0.3 22.4
(OpenAI) text-embedding-3-large 32.1 31.4 23.8 34.2 11.9 10.7 26.3 0.0 21.3
Cohere-embed-english-v3.0 31.5 34.5 18.9 20.5 9.9 15.8 15.2 0.8 18.4
SBERT 25.6 34.1 18.9 15.8 10.9 15 18 1.2 17.4
bge-large-en-v1.5 16.4 27.7 20.9 11.6 10.9 13.3 16.9 0.4 14.8
BM25 10.7 15.4 10.7 8.4 7.4 22.2 10.7 5.4 11.4

4、Retrieval Pipelines on Long Documents

Model Bio. Earth. Econ. Psy. Rob. Stack. Sus. Pony Avg
Mira-Reasoning-Retrieval 77.1 59.0 71.2 73.8 46.0 35.5 70.6 14.6 56.0
INF-X-Retriever (inf+retrieve) 73.2 59.6 69.3 74.3 55.9 27.8 64.8 12.0 54.6