Leaderboard
This page tracks benchmark results for HybridRAG-Bench.
Summary
Metric columns can be adapted to your final evaluation protocol.
Higher is better unless a column explicitly states lower-is-better.
Update the table by editing
docs/source/_static/leaderboard.csv.
Current Results
Rank |
Date |
Model |
Method |
Dataset |
EM |
F1 |
Faithfulness |
Notes |
|---|---|---|---|---|---|---|---|---|
1 |
2026-02-12 |
Llama-3.3-70B |
HybridRAG |
movie |
0.000 |
0.000 |
0.000 |
placeholder |
2 |
2026-02-12 |
Llama-3.3-70B |
KG-RAG |
movie |
0.000 |
0.000 |
0.000 |
placeholder |
3 |
2026-02-12 |
Llama-3.3-70B |
RAG |
movie |
0.000 |
0.000 |
0.000 |
placeholder |
4 |
2026-02-12 |
Llama-3.3-70B |
IO |
movie |
0.000 |
0.000 |
0.000 |
placeholder |
Submission Format
Use this schema when adding new results:
Date: YYYY-MM-DDModel: Model name and sizeMethod: IO, RAG, KG-RAG, HybridRAG, etc.Dataset: Evaluated split/domainEM: Exact matchF1: Token-level F1Faithfulness: Attribution/grounding consistencyNotes: Optional details (retriever, hops, or config)