Cross-Encoder vs. Logprob Reranking: A Practical Guide for AI Search
Search Technology

Cross-Encoder vs. Logprob Reranking: A Practical Guide for AI Search

April 22, 20269 min read

Your search pipeline found 50 candidate documents. Now what? The reranker decides which ones the AI actually reads. Get this wrong, and your chatbot answers from the wrong document — confidently. Here's how to choose the right approach.

Why Reranking Matters

The first search stage — keyword matching and vector similarity — is fast but imprecise. It returns roughly relevant documents, but the ordering is often wrong. Your pricing page might rank below a blog post that mentions pricing in passing.

Retrieval is about casting a wide net. Reranking is about keeping only the best results.

The reranker takes the top 20–30 candidates and re-scores them with a more accurate model. The top 5–10 survivors become the AI's context for answering.

Approach 1: Cross-Encoder

A specialized model trained specifically for relevance scoring. It processes the query and document together — the query tokens attend to the document tokens — catching nuances that separate embeddings miss.

Example: "laptop return policy"

DocumentScoreWhy
Returns & Refunds page (mentions "laptop")0.92Trained to match "return policy" + product
Laptop product page (no return info)0.45About laptops, not about returns
General FAQ (mentions returns briefly)0.38Related but not specific
Fast
~200ms for 20 documents. Small model (33M params), runs on CPU.
Deterministic
Same input = same score. No temperature or sampling variance.
Surface-level
Matches patterns, doesn't truly understand intent. Struggles with vocabulary mismatch.
Hard to customize
Needs fine-tuning for new domains. Can't be improved with prompting.

Approach 2: LLM Logprob Reranking

Instead of a specialized model, you ask a general-purpose LLM: "Is this document relevant? Yes or No." But instead of reading the text answer, you read the probability behind it — how confident the model is.

How logprobs work
If the model is 95% confident the answer is "Yes," that's a high relevance score. If it's 55% Yes / 45% No, that's marginal. This gives a smooth score from 0.0 to 1.0 — much more useful for ranking than a binary yes/no.

Example: "How do I cancel my subscription?"

DocumentScoreWhy
Subscription Management ("modify, pause, or end your plan")0.94LLM understands "cancel" = "end your plan"
Billing FAQ (mentions "cancel" directly)0.88Direct keyword match, less comprehensive
Pricing page (lists all plans)0.22Related topic but not about canceling
Deep understanding
"cancel" = "end your plan" = "terminate subscription." Understands intent, not just words.
Works across languages
Cross-encoders are typically English-trained. LLMs handle multilingual queries natively.
Slower
~300ms for 20 documents. Needs LLM API access or GPU.
Slightly variable
Scores can vary slightly across runs due to temperature. Prompt wording affects results.

The Exact-Match Problem

Both approaches share a failure mode. Query: "error code E-4021"

What happens
Keyword search finds the exact error page immediately. But the reranker scores a general "Troubleshooting Guide" higher because it has richer context and more vocabulary overlap with "error" concepts.
The fix: position-aware blending
Top results trust the original search score. Lower results trust the reranker. Exact matches stay at the top where they belong.
PositionSearch ScoreReranker Score
Top 1–375%25%
4–1060%40%
11+40%60%

Side-by-Side Comparison

DimensionCross-EncoderLLM LogprobWinner
Speed (20 docs)~200ms~300msCross-Encoder
Understanding depthPattern matchingFull reasoningLogprob LLM
Vocabulary mismatchModerateStrongLogprob LLM
Simple queriesGreatOverkillCross-Encoder
InfrastructureLightweight (CPU)Needs LLM/GPUCross-Encoder
CustomizationNeeds fine-tuningPrompt engineeringLogprob LLM

Which One Should You Use?

High-volume support
Cross-encoder + position-aware blending. Speed matters, most queries are straightforward.
Complex knowledge (legal, medical)
LLM logprob reranking. Intent reasoning catches domain-specific vocabulary.
Multi-language support
LLM logprob reranking. Cross-encoders are typically English-only.
Mixed query types
Hybrid: cross-encoder by default, escalate to LLM for long or complex queries.
The hybrid approach
Use the cross-encoder as the default (fast, reliable). Escalate to LLM reranking when: query is 10+ words, cross-encoder confidence is below 0.6, or the query contains negation ("not," "without," "except").

Get Started

1
Start with a cross-encoder
ms-marco-MiniLM or Cohere Rerank. It's the 80/20 solution.
2
Add position-aware blending
Protect your top results from being demoted. Works immediately.
3
Measure your baseline
Track dead-end rate and answer quality before changing anything.
4
Consider LLM reranking
If you see vocabulary mismatch failures — queries that should match but don't.

Related: Query Expansion: The Concept | Implementation Lessons | Knowledge Lint | Auto-Synthesized Knowledge

Build a smarter AI chatbot

GetGenius trains on your website and docs to deliver accurate, consistent answers 24/7. No per-seat pricing. AI included in every plan.

Start free trial

Keep Reading