Implementing Query Expansion: 6 Hard Lessons from Building Multi-Query Search
Search Technology

Implementing Query Expansion: 6 Hard Lessons from Building Multi-Query Search

April 22, 202611 min read

Query expansion sounds simple: search with three versions of a question instead of one. In practice, every step has pitfalls. Here are six lessons from building multi-query search in production.

Quick Recap

In a standard chatbot, the user's question becomes one search query. Query expansion generates two alternative versions, runs them all in parallel, and merges the results. The goal: find relevant pages that the original wording would miss.

(For the full overview, read Query Expansion: How AI Chatbots Find Answers You Didn't Know You Had.)

Lesson 1: The Original Query Must Win Ties

Think of expansion queries as "scouts" — they explore territory the original can't reach. But when the original finds what it's looking for, the scouts should step aside.

The customer's own words are almost always the best search signal. Expansion fills gaps — it shouldn't overpower the original. The fix: give the original query 2× weight in result scoring. If a page ranks #1 for the original and #5 for an expansion, it stays near the top.

Lesson 2: Position-Aware Blending Protects Exact Matches

Many chatbots use a reranker — a second model that re-scores search results. Rerankers are powerful, but they can push down exact matches in favor of documents with "richer context."

The problem
User searches "error code E-4021." BM25 finds the exact error page. But the reranker scores a general "Troubleshooting Guide" higher because it has more semantic overlap.
The fix: position-aware blending
Top results trust the original search score. Lower results trust the reranker more. Exact matches stay at the top.
PositionSearch ScoreReranker ScoreWhy
Top 1–375%25%Protect exact matches
4–1060%40%Balanced blend
11+40%60%Trust reranker on weaker results
You can add position-aware blending even without query expansion. It improves reranking quality immediately.

Lesson 3: Two Good Expansions Beat Five Mediocre Ones

It's tempting to generate 5 or 10 alternatives to maximize coverage. In practice, two well-crafted expansions beat five mediocre ones:

More queries = more noise
Mediocre expansions push irrelevant documents into results through sheer volume.
Diminishing returns
The first expansion catches 80% of vocabulary mismatch. The second catches another 15%. After that, you're adding noise.

The sweet spot — two alternatives with specific goals:

1
Vocabulary reformulation
Use different words for the same concept. "warranty" → "guarantee and replacement policy"
2
Keyword extraction
Pull out searchable terms. "Do you ship to Berlin?" → "Berlin shipping delivery EU coverage"

Lesson 4: Know When to Skip

Query TypeExampleExpand?
Very short (1–2 words)"pricing"❌ Skip
Exact identifiers"error E-4021"❌ Skip
Product codes"SKU-PRO-2026-X"❌ Skip
Natural questions"What's your return policy?"✅ Expand
Multi-concept queries"shipping time for bulk orders"✅ Expand
Start simple: expand queries with 3+ words, skip shorter ones. Add smarter heuristics later once you have data.

Lesson 5: Feature Flags Are Non-Negotiable

Query expansion changes search behavior fundamentally. Rolling it out without the ability to toggle it per customer is asking for trouble.

LevelPurposeDefault
GlobalKill switch for the entire featureOn
Per-chatbotEnable/disable for individual instancesOff (opt-in after testing)
Per-queryAuto-skip for short/identifier queriesAlways on when expansion is enabled

The worst scenario: enabling expansion globally, seeing 30% better average quality, but not noticing that 5% of exact-match queries got worse. Feature flags let you catch this in a controlled environment.

Lesson 6: Choosing the Right Reranker

There are two approaches to re-scoring search results:

Cross-Encoder (simpler)
A specialized small model (~33M params) trained specifically for relevance scoring. Fast (~200ms for 20 docs), strong on keyword matching. Struggles with intent — "How do I cancel?" vs a page about "subscription management."
LLM Reranking (smarter)
A general-purpose LLM (~600M params) that reasons about relevance. Slightly slower (~300ms for 20 docs), but understands intent, context, and implication. Better for complex queries.
Practical advice
Start with a cross-encoder + position-aware blending. The blending fixes the cross-encoder's main weakness (exact-match demotion). Upgrade to LLM reranking later if you have complex intent-based queries.

The Full Pipeline

1
User asks a question
"Do you ship to Berlin?"
2
Query rewriting (existing)
Resolves pronouns from conversation history ("it" → "the Pro Plan").
3
Expansion check
Is this query 3+ words and not an identifier? → Yes, proceed.
4
Generate 2 alternatives
"Berlin delivery area shipping coverage" + "European shipping destinations"
5
Embed all 3 queries (batched)
Single API call, no extra latency.
6
Run 6 parallel searches
3 vector + 3 keyword, all at the same time.
7
Weighted fusion + reranking
Original at 2× weight, position-aware blending, top 5 to the LLM.
+180ms
Added latency
6
Parallel searches
~30%
Fewer dead-ends
0
Regressions with blending

How to Measure Success

MetricWhat It ShowsTarget
Dead-end rate% of "I don't know" responses30–40% reduction
Context diversityUnique source pages per query+1–2 more pages
User satisfactionThumbs up/down ratioMeasurable uplift in 2 weeks
Regression rateQueries that got worseZero with position-aware blending

The Bigger Picture

Query expansion finds more documents — which also means it finds more contradictions. This is why knowledge lint and expansion are complementary: lint catches conflicts at training time, expansion catches them at query time.

Together with auto-synthesized knowledge, they form a complete pipeline: clean data → structured knowledge → better search → accurate answers.


Related: Query Expansion: The Concept | Knowledge Lint | Auto-Synthesized Knowledge | Dark AI Traffic

Build a smarter AI chatbot

GetGenius trains on your website and docs to deliver accurate, consistent answers 24/7. No per-seat pricing. AI included in every plan.

Start free trial

Keep Reading