BM25 Search: The 30-Year-Old Algorithm That Still Beats Neural Search
Search Technology

BM25 Search: The 30-Year-Old Algorithm That Still Beats Neural Search

April 22, 20269 min read

Every AI chatbot searches your knowledge base before answering. Most use embeddings. The smart ones also use BM25 — a keyword-matching algorithm from 1994 that, in head-to-head tests, still beats pure neural search for exact questions like "What's the price of Plan B?"

What Is BM25?

BM25 (Best Matching 25) is a ranking function that scores how relevant a document is to a search query based on the words they share. It was developed in the 1990s by Stephen Robertson and Karen Spärck Jones at City University of London.

Despite being 30+ years old, BM25 is still the default ranking algorithm in Elasticsearch, Apache Lucene, PostgreSQL full-text search, and every major search engine. Google used a BM25 variant for years before layering neural approaches on top.

Why it matters for AI chatbots
When a customer asks your chatbot "What's the cancellation policy?", you need to find the exact document that mentions "cancellation policy." Semantic search understandsmeaning, but BM25 matches the exact words. The best search systems use both.

How BM25 Works (Without the Math)

BM25 scores documents based on three intuitions that are surprisingly hard to beat:

Term Frequency (TF)
A document that mentions "pricing" 5 times is probably more about pricing than one that mentions it once. But the 50th mention doesn't help much — diminishing returns.
Inverse Document Frequency (IDF)
The word "the" appears in every document — it's useless for ranking. "Cancellation" appears in 3 out of 100 documents — it's very useful. Rare words matter more.
Document Length Normalization
A 50-word FAQ that mentions "pricing" twice is more focused than a 5,000-word page that mentions it twice. BM25 adjusts for document length.

BM25 is TF-IDF's smarter cousin. Same idea — rare words in focused documents score highest — but with better math that prevents long documents from dominating.

The BM25 Formula (Simplified)

For those who want the actual formula — here's the simplified version:

ComponentFormulaWhat It Does
IDFlog((N - n + 0.5) / (n + 0.5))Penalizes common words, rewards rare ones
TF saturation(f × (k1 + 1)) / (f + k1 × (1 - b + b × dl/avgdl))Counts word frequency with diminishing returns
k1Typically 1.2Controls how fast TF saturates
bTypically 0.75Controls document length normalization

N = total documents, n = documents containing the term, f = term frequency in this document, dl = document length, avgdl = average document length.

The key insight
BM25's TF saturation is what makes it better than raw TF-IDF. In TF-IDF, a document with 100 mentions of "pricing" scores 10× higher than one with 10 mentions. In BM25, it scores maybe 1.5× higher — because after a certain point, more mentions don't make the document more relevant. They just make it longer.

BM25 vs Semantic (Vector) Search

Modern AI chatbots typically use semantic search (embeddings) to find relevant content. Here's how the two approaches compare:

DimensionBM25 (Keyword)Semantic (Embeddings)
How it worksMatches exact wordsMatches meaning via vector similarity
"What is your pricing?"🟢 Finds pages with "pricing"🟢 Finds pages about costs, even without the word
"Plan B cancellation policy"🟢 Exact match — finds it instantly🟡 Might confuse with other plans
"I'm unhappy with the service"🔴 No keyword match for complaints🟢 Understands this is a complaint
Speed🟢 Sub-millisecond (inverted index)🟡 5-50ms (vector similarity)
Explainability🟢 Clear — "matched on these words"🔴 Black box — "closest vector"
Zero-shot🟢 Works immediately, no model needed🔴 Requires embedding model
Typo tolerance🔴 "pricng" won't match "pricing"🟢 Handles misspellings naturally

Neither approach wins alone. BM25 excels at exact, specific queries. Semantic search excels at vague, intent-based queries. The best systems use both — this is called hybrid search.

Hybrid Search: Why We Use Both

At GetGenius, we run both BM25 and semantic search in parallel, then merge the results using Reciprocal Rank Fusion (RRF). This approach consistently outperforms either method alone:

+23%
Recall vs semantic-only
+31%
Recall vs BM25-only
<50ms
Combined latency
Semantic search only
Customer asks "What's the SLA for Enterprise?" → Semantic search returns 5 documents about service levels generically. The specific Enterprise SLA doc is ranked #4.
Hybrid (BM25 + Semantic)
Same query → BM25 boosts the doc containing "Enterprise" and "SLA" as exact words. Semantic adds docs about service commitments. After RRF fusion, the Enterprise SLA doc is #1.

How We Implement BM25 in Production

We use PostgreSQL with pg_search (formerly ParadeDB) for BM25 scoring. This lets us run BM25 and vector search in the same database — no separate Elasticsearch cluster needed.

1
Index creation
When content is ingested, we build a BM25 inverted index alongside the vector embeddings. Both live in the same PostgreSQL database, on the same rows.
2
Parallel search
At query time, we fire two queries simultaneously: a BM25 text search and a vector similarity search. Both return scored, ranked results.
3
Reciprocal Rank Fusion (RRF)
Results from both searches are merged using RRF, which combines rankings without needing to normalize scores across different scales. A document ranked #1 by both methods scores highest.
4
Reranking (optional)
For highest quality, we apply a reranker to the fused results — either a cross-encoder or logprob LLM reranker — for the final ordering.
Why PostgreSQL, not Elasticsearch?
Elasticsearch is the traditional choice for BM25. But running a separate search cluster adds operational complexity and cost. PostgreSQL's pg_search extension gives us production-grade BM25 in the same database where our vectors, metadata, and application data already live. One database, not two.

When BM25 Wins (And When It Doesn't)

✅ BM25 wins for:

  • Specific, keyword-heavy queries — "Plan B pricing", "API rate limits", "return policy"
  • Named entities — Product names, feature names, plan names, people's names
  • Exact phrases — "30-day money-back guarantee"
  • Technical documentation — Error codes, config keys, function names
  • Low-latency requirements — BM25 is 10-100× faster than vector search

❌ Semantic search wins for:

  • Vague, intent-based queries — "I'm not happy", "something went wrong"
  • Paraphrased questions — "How much does it cost?" vs "What's the pricing?"
  • Cross-language queries — Query in French, content in English
  • Conceptual similarity — Finding content about "refunds" when the query mentions "getting money back"

BM25 Tuning: The k1 and b Parameters

BM25 has two tunable parameters that significantly affect results:

ParameterDefaultEffectWhen to Adjust
k11.2TF saturation speedLower (0.5-0.8) for short docs. Higher (1.5-2.0) for long docs where repetition matters.
b0.75Length normalization strengthSet to 0 if all docs are similar length. Set to 1.0 if doc lengths vary wildly (mix of FAQs and long articles).
Practical tip
For AI chatbot knowledge bases (mix of FAQ pages, help articles, and product pages), the defaults of k1=1.2, b=0.75 work well. Don't over-tune — the hybrid approach with semantic search compensates for BM25's weaknesses.

The Missing Layer in Most AI Chatbot Platforms

Most AI chatbot platforms primarily rely on vector embeddings for retrieval. They chunk your content, create embeddings, and search by cosine similarity. While some may add basic keyword matching, few implement true hybrid search with BM25 and Reciprocal Rank Fusion.

This works fine for conversational questions but struggles with specific, keyword-heavy ones:

Embeddings only (most platforms)
Customer: "What's the API rate limit for the Pro plan?" → Semantic search returns 5 documents vaguely about APIs and plans. The specific rate-limit doc is buried at #3.
Hybrid search (GetGenius)
Same query → BM25 finds the exact doc containing "API rate limit" and "Pro plan." Semantic search adds context docs about API usage. Correct answer is #1.

BM25 in the Age of LLMs

Some people assume BM25 is obsolete now that we have LLMs. The opposite is true — BM25 is more important in the RAG (Retrieval-Augmented Generation) era:

  1. RAG needs retrieval — An LLM can only answer from content you put in its context window. BM25 helps find the right content to include.
  2. Precision matters — Feeding the LLM wrong documents wastes tokens and produces wrong answers. BM25's exact matching reduces noise.
  3. Speed matters — In real-time chat, every millisecond counts. BM25 is 10-100× faster than vector search.
  4. Complementary signals — BM25 and embeddings fail on different queries. Using both covers more ground, as our query expansion approach demonstrates.

BM25 isn't a legacy technology being replaced by AI. It's a foundational retrieval layer that makes AI chatbots more accurate, faster, and more reliable.

Try It Yourself

Want to see hybrid search in action? Our free demo tools let you test how AI answers questions using your own content:


Related: Query Expansion: Finding More Answers | Cross-Encoder vs Logprob Reranking | Knowledge Lint: Auditing Your Training Data

Build a smarter AI chatbot

GetGenius trains on your website and docs to deliver accurate, consistent answers 24/7. No per-seat pricing. AI included in every plan.

Start free trial

Keep Reading