How many query expansions should I generate?

Two is the sweet spot. The first expansion reformulates vocabulary (synonym substitution), the second extracts searchable keywords. More than two adds noise without proportional improvement — the first expansion catches ~80% of vocabulary mismatch.

What is position-aware blending in search reranking?

Position-aware blending mixes retrieval scores and reranker scores at different ratios based on result position. Top 1-3 results use 75% retrieval / 25% reranker (protecting exact matches), while results ranked 11+ use 40% retrieval / 60% reranker (trusting the reranker more on uncertain tail results).

When should you skip query expansion?

Skip expansion for very short queries (1-2 words), exact identifiers (error codes, SKUs), and queries where the first-pass search returns results with >0.85 similarity. Natural language questions and multi-concept queries benefit most from expansion.

What is logprob-based LLM reranking?

Instead of using a specialized cross-encoder model, logprob reranking asks a general-purpose LLM 'Is this document relevant? Yes/No' and reads the probability distribution behind the answer. A 95% confidence in 'Yes' gives a score of ~0.95, while 55% confidence gives ~0.55 — providing a continuous relevance score with full LLM reasoning capability.

Implementing Query Expansion: 6 Hard Lessons from Building Multi-Query Search

Query expansion sounds simple: search with three versions of a question instead of one. In practice, every step has pitfalls. Here are six lessons from building multi-query search in production.

Quick Recap

In a standard chatbot, the user's question becomes one search query. Query expansion generates two alternative versions, runs them all in parallel, and merges the results. The goal: find relevant pages that the original wording would miss.

(For the full overview, read Query Expansion: How AI Chatbots Find Answers You Didn't Know You Had.)

Weighted Reciprocal Rank Fusion merging results from multiple queries — The original query gets 2× weight in the merge — it should always win ties

Lesson 1: The Original Query Must Win Ties

Think of expansion queries as "scouts" — they explore territory the original can't reach. But when the original finds what it's looking for, the scouts should step aside.

The customer's own words are almost always the best search signal. Expansion fills gaps — it shouldn't overpower the original. The fix: give the original query 2× weight in result scoring. If a page ranks #1 for the original and #5 for an expansion, it stays near the top.

Position-aware score blending protecting top search results — Top-3 results get 75% retrieval weight to protect exact matches from reranker demotion

Lesson 2: Position-Aware Blending Protects Exact Matches

Many chatbots use a reranker — a second model that re-scores search results. Rerankers are powerful, but they can push down exact matches in favor of documents with "richer context."

The problem

User searches "error code E-4021." BM25 finds the exact error page. But the reranker scores a general "Troubleshooting Guide" higher because it has more semantic overlap.

The fix: position-aware blending

Top results trust the original search score. Lower results trust the reranker more. Exact matches stay at the top.

Position	Search Score	Reranker Score	Why
Top 1–3	75%	25%	Protect exact matches
4–10	60%	40%	Balanced blend
11+	40%	60%	Trust reranker on weaker results

You can add position-aware blending even without query expansion. It improves reranking quality immediately.

Lesson 3: Two Good Expansions Beat Five Mediocre Ones

It's tempting to generate 5 or 10 alternatives to maximize coverage. In practice, two well-crafted expansions beat five mediocre ones:

More queries = more noise

Mediocre expansions push irrelevant documents into results through sheer volume.

Diminishing returns

The first expansion catches 80% of vocabulary mismatch. The second catches another 15%. After that, you're adding noise.

The sweet spot — two alternatives with specific goals:

Vocabulary reformulation

Use different words for the same concept. "warranty" → "guarantee and replacement policy"

Keyword extraction

Pull out searchable terms. "Do you ship to Berlin?" → "Berlin shipping delivery EU coverage"

Lesson 4: Know When to Skip

Query Type	Example	Expand?
Very short (1–2 words)	"pricing"	❌ Skip
Exact identifiers	"error E-4021"	❌ Skip
Product codes	"SKU-PRO-2026-X"	❌ Skip
Natural questions	"What's your return policy?"	✅ Expand
Multi-concept queries	"shipping time for bulk orders"	✅ Expand

Start simple: expand queries with 3+ words, skip shorter ones. Add smarter heuristics later once you have data.

Lesson 5: Feature Flags Are Non-Negotiable

Query expansion changes search behavior fundamentally. Rolling it out without the ability to toggle it per customer is asking for trouble.

Level	Purpose	Default
Global	Kill switch for the entire feature	On
Per-chatbot	Enable/disable for individual instances	Off (opt-in after testing)
Per-query	Auto-skip for short/identifier queries	Always on when expansion is enabled

The worst scenario: enabling expansion globally, seeing 30% better average quality, but not noticing that 5% of exact-match queries got worse. Feature flags let you catch this in a controlled environment.

Lesson 6: Choosing the Right Reranker

There are two approaches to re-scoring search results:

Cross-Encoder (simpler)

A specialized small model (~33M params) trained specifically for relevance scoring. Fast (~200ms for 20 docs), strong on keyword matching. Struggles with intent — "How do I cancel?" vs a page about "subscription management."

LLM Reranking (smarter)

A general-purpose LLM (~600M params) that reasons about relevance. Slightly slower (~300ms for 20 docs), but understands intent, context, and implication. Better for complex queries.

Practical advice

Start with a cross-encoder + position-aware blending. The blending fixes the cross-encoder's main weakness (exact-match demotion). Upgrade to LLM reranking later if you have complex intent-based queries.

The Full Pipeline

User asks a question

"Do you ship to Berlin?"

Query rewriting (existing)

Resolves pronouns from conversation history ("it" → "the Pro Plan").

Expansion check

Is this query 3+ words and not an identifier? → Yes, proceed.

Generate 2 alternatives

"Berlin delivery area shipping coverage" + "European shipping destinations"

Embed all 3 queries (batched)

Single API call, no extra latency.

Run 6 parallel searches

3 vector + 3 keyword, all at the same time.

Weighted fusion + reranking

Original at 2× weight, position-aware blending, top 5 to the LLM.

+180ms

Added latency

Parallel searches

~30%

Fewer dead-ends

Regressions with blending

How to Measure Success

Metric	What It Shows	Target
Dead-end rate	% of "I don't know" responses	30–40% reduction
Context diversity	Unique source pages per query	+1–2 more pages
User satisfaction	Thumbs up/down ratio	Measurable uplift in 2 weeks
Regression rate	Queries that got worse	Zero with position-aware blending

The Bigger Picture

Query expansion finds more documents — which also means it finds more contradictions. This is why knowledge lint and expansion are complementary: lint catches conflicts at training time, expansion catches them at query time.

Together with auto-synthesized knowledge, they form a complete pipeline: clean data → structured knowledge → better search → accurate answers.

Try query expansion — free for 3 days