BM25 (Best Matching 25) is a ranking function that scores document relevance based on keyword matching. It uses term frequency (how often a word appears), inverse document frequency (how rare the word is), and document length normalization. Despite being from 1994, it's still used in Elasticsearch, PostgreSQL, and every major search engine.

Is BM25 better than semantic search?

Neither is universally better. BM25 excels at specific, keyword-heavy queries like 'API rate limit' or 'Plan B pricing.' Semantic search excels at vague, intent-based queries like 'I'm unhappy with the service.' The best AI chatbots use both in a hybrid approach, combining BM25 and vector similarity with Reciprocal Rank Fusion.

Why do AI chatbots still need BM25?

In RAG (Retrieval-Augmented Generation) systems, BM25 provides exact keyword matching that embeddings miss. When a customer asks about a specific product name, plan, or feature, BM25 finds the exact document while semantic search may return related but wrong content. BM25 is also 10-100× faster than vector search.

What is hybrid search?

Hybrid search runs BM25 (keyword) and semantic (vector) search in parallel, then merges results using Reciprocal Rank Fusion (RRF). This consistently outperforms either method alone — typically +23% recall vs semantic-only and +31% vs BM25-only.

BM25 Search: The 30-Year-Old Algorithm That Still Beats Neural Search

Every AI chatbot searches your knowledge base before answering. Most use embeddings. The smart ones also use BM25 — a keyword-matching algorithm from 1994 that, in head-to-head tests, still beats pure neural search for exact questions like "What's the price of Plan B?"

What Is BM25?

BM25 (Best Matching 25) is a ranking function that scores how relevant a document is to a search query based on the words they share. It was developed in the 1990s by Stephen Robertson and Karen Spärck Jones at City University of London.

Despite being 30+ years old, BM25 is still the default ranking algorithm in Elasticsearch, Apache Lucene, PostgreSQL full-text search, and every major search engine. Google used a BM25 variant for years before layering neural approaches on top.

Why it matters for AI chatbots

When a customer asks your chatbot "What's the cancellation policy?", you need to find the exact document that mentions "cancellation policy." Semantic search understandsmeaning, but BM25 matches the exact words. The best search systems use both.

TF-IDF scoring showing how rare terms get higher relevance scores — TF-IDF basics: rare, specific terms score higher than common words

How BM25 Works (Without the Math)

BM25 scores documents based on three intuitions that are surprisingly hard to beat:

Term Frequency (TF)

A document that mentions "pricing" 5 times is probably more about pricing than one that mentions it once. But the 50th mention doesn't help much — diminishing returns.

Inverse Document Frequency (IDF)

The word "the" appears in every document — it's useless for ranking. "Cancellation" appears in 3 out of 100 documents — it's very useful. Rare words matter more.

Document Length Normalization

A 50-word FAQ that mentions "pricing" twice is more focused than a 5,000-word page that mentions it twice. BM25 adjusts for document length.

BM25 is TF-IDF's smarter cousin. Same idea — rare words in focused documents score highest — but with better math that prevents long documents from dominating.

The BM25 Formula (Simplified)

For those who want the actual formula — here's the simplified version:

Component	Formula	What It Does
IDF	log((N - n + 0.5) / (n + 0.5))	Penalizes common words, rewards rare ones
TF saturation	(f × (k1 + 1)) / (f + k1 × (1 - b + b × dl/avgdl))	Counts word frequency with diminishing returns
k1	Typically 1.2	Controls how fast TF saturates
b	Typically 0.75	Controls document length normalization

N = total documents, n = documents containing the term, f = term frequency in this document, dl = document length, avgdl = average document length.

The key insight

BM25's TF saturation is what makes it better than raw TF-IDF. In TF-IDF, a document with 100 mentions of "pricing" scores 10× higher than one with 10 mentions. In BM25, it scores maybe 1.5× higher — because after a certain point, more mentions don't make the document more relevant. They just make it longer.

RAG retrieval concept showing knowledge base feeding into AI answers — In modern RAG, BM25 and vector search work together for the best results

BM25 vs Semantic (Vector) Search

Modern AI chatbots typically use semantic search (embeddings) to find relevant content. Here's how the two approaches compare:

Dimension	BM25 (Keyword)	Semantic (Embeddings)
How it works	Matches exact words	Matches meaning via vector similarity
"What is your pricing?"	🟢 Finds pages with "pricing"	🟢 Finds pages about costs, even without the word
"Plan B cancellation policy"	🟢 Exact match — finds it instantly	🟡 Might confuse with other plans
"I'm unhappy with the service"	🔴 No keyword match for complaints	🟢 Understands this is a complaint
Speed	🟢 Sub-millisecond (inverted index)	🟡 5-50ms (vector similarity)
Explainability	🟢 Clear — "matched on these words"	🔴 Black box — "closest vector"
Zero-shot	🟢 Works immediately, no model needed	🔴 Requires embedding model
Typo tolerance	🔴 "pricng" won't match "pricing"	🟢 Handles misspellings naturally

Neither approach wins alone. BM25 excels at exact, specific queries. Semantic search excels at vague, intent-based queries. The best systems use both — this is called hybrid search.

Hybrid Search: Why We Use Both

At GetGenius, we run both BM25 and semantic search in parallel, then merge the results using Reciprocal Rank Fusion (RRF). This approach consistently outperforms either method alone:

+23%

Recall vs semantic-only

+31%

Recall vs BM25-only

<50ms

Combined latency

Semantic search only

Customer asks "What's the SLA for Enterprise?" → Semantic search returns 5 documents about service levels generically. The specific Enterprise SLA doc is ranked #4.

Hybrid (BM25 + Semantic)

Same query → BM25 boosts the doc containing "Enterprise" and "SLA" as exact words. Semantic adds docs about service commitments. After RRF fusion, the Enterprise SLA doc is #1.

How We Implement BM25 in Production

We use PostgreSQL with pg_search (formerly ParadeDB) for BM25 scoring. This lets us run BM25 and vector search in the same database — no separate Elasticsearch cluster needed.

Index creation

When content is ingested, we build a BM25 inverted index alongside the vector embeddings. Both live in the same PostgreSQL database, on the same rows.

Parallel search

At query time, we fire two queries simultaneously: a BM25 text search and a vector similarity search. Both return scored, ranked results.

Reciprocal Rank Fusion (RRF)

Results from both searches are merged using RRF, which combines rankings without needing to normalize scores across different scales. A document ranked #1 by both methods scores highest.

Reranking (optional)

For highest quality, we apply a reranker to the fused results — either a cross-encoder or logprob LLM reranker — for the final ordering.

Why PostgreSQL, not Elasticsearch?

Elasticsearch is the traditional choice for BM25. But running a separate search cluster adds operational complexity and cost. PostgreSQL's pg_search extension gives us production-grade BM25 in the same database where our vectors, metadata, and application data already live. One database, not two.

When BM25 Wins (And When It Doesn't)

✅ BM25 wins for:

Specific, keyword-heavy queries — "Plan B pricing", "API rate limits", "return policy"
Named entities — Product names, feature names, plan names, people's names
Exact phrases — "30-day money-back guarantee"
Technical documentation — Error codes, config keys, function names
Low-latency requirements — BM25 is 10-100× faster than vector search

❌ Semantic search wins for:

Vague, intent-based queries — "I'm not happy", "something went wrong"
Paraphrased questions — "How much does it cost?" vs "What's the pricing?"
Cross-language queries — Query in French, content in English
Conceptual similarity — Finding content about "refunds" when the query mentions "getting money back"

BM25 Tuning: The k1 and b Parameters

BM25 has two tunable parameters that significantly affect results:

Parameter	Default	Effect	When to Adjust
k1	1.2	TF saturation speed	Lower (0.5-0.8) for short docs. Higher (1.5-2.0) for long docs where repetition matters.
b	0.75	Length normalization strength	Set to 0 if all docs are similar length. Set to 1.0 if doc lengths vary wildly (mix of FAQs and long articles).

Practical tip

For AI chatbot knowledge bases (mix of FAQ pages, help articles, and product pages), the defaults of k1=1.2, b=0.75 work well. Don't over-tune — the hybrid approach with semantic search compensates for BM25's weaknesses.

The Missing Layer in Most AI Chatbot Platforms

Most AI chatbot platforms primarily rely on vector embeddings for retrieval. They chunk your content, create embeddings, and search by cosine similarity. While some may add basic keyword matching, few implement true hybrid search with BM25 and Reciprocal Rank Fusion.

This works fine for conversational questions but struggles with specific, keyword-heavy ones:

Embeddings only (most platforms)

Customer: "What's the API rate limit for the Pro plan?" → Semantic search returns 5 documents vaguely about APIs and plans. The specific rate-limit doc is buried at #3.

Hybrid search (GetGenius)

Same query → BM25 finds the exact doc containing "API rate limit" and "Pro plan." Semantic search adds context docs about API usage. Correct answer is #1.

BM25 in the Age of LLMs

Some people assume BM25 is obsolete now that we have LLMs. The opposite is true — BM25 is more important in the RAG (Retrieval-Augmented Generation) era:

RAG needs retrieval — An LLM can only answer from content you put in its context window. BM25 helps find the right content to include.
Precision matters — Feeding the LLM wrong documents wastes tokens and produces wrong answers. BM25's exact matching reduces noise.
Speed matters — In real-time chat, every millisecond counts. BM25 is 10-100× faster than vector search.
Complementary signals — BM25 and embeddings fail on different queries. Using both covers more ground, as our query expansion approach demonstrates.

BM25 isn't a legacy technology being replaced by AI. It's a foundational retrieval layer that makes AI chatbots more accurate, faster, and more reliable.

Try It Yourself

Want to see hybrid search in action? Our free demo tools let you test how AI answers questions using your own content:

Chat with Website — Enter any URL and see hybrid search find answers
Chat with Document — Upload a document and ask specific questions
Chat with Text — Paste content and test exact-match queries
AI Visibility Score — Check if your content is optimized for AI search

Try hybrid search with your content — free for 3 days