What is the difference between cross-encoder and logprob reranking?

Cross-encoder reranking uses a specialized small model (e.g., ms-marco-MiniLM) trained specifically for relevance scoring, processing query-document pairs jointly. Logprob reranking uses a general-purpose LLM and reads the probability distribution of Yes/No relevance judgments. Cross-encoders are faster (~200ms) but less semantically deep; logprob rerankers are slower (~300ms) but can reason about intent and vocabulary mismatch.

What is position-aware blending in search?

Position-aware blending mixes retrieval scores and reranker scores at different ratios based on result position. Top 1-3 results use 75% retrieval / 25% reranker to protect exact matches. Results ranked 11+ use 40% retrieval / 60% reranker, trusting the reranker more on uncertain tail results.

When should I use logprob reranking instead of a cross-encoder?

Use logprob reranking for complex knowledge bases (legal, medical, technical) where vocabulary mismatch is common, for multi-language support, or when cross-encoder confidence is consistently low. Use cross-encoders for high-volume support, latency-critical applications, and straightforward query patterns.

Cross-Encoder vs. Logprob Reranking: A Practical Guide for AI Search

Your search pipeline found 50 candidate documents. Now what? The reranker decides which ones the AI actually reads. Get this wrong, and your chatbot answers from the wrong document — confidently. Here's how to choose the right approach.

Why Reranking Matters

The first search stage — keyword matching and vector similarity — is fast but imprecise. It returns roughly relevant documents, but the ordering is often wrong. Your pricing page might rank below a blog post that mentions pricing in passing.

Retrieval is about casting a wide net. Reranking is about keeping only the best results.

The reranker takes the top 20–30 candidates and re-scores them with a more accurate model. The top 5–10 survivors become the AI's context for answering.

Cross-encoder vs LLM reranker comparison showing scoring approaches — Two reranking approaches: neural scoring vs LLM relevance judgment

Approach 1: Cross-Encoder

A specialized model trained specifically for relevance scoring. It processes the query and document together — the query tokens attend to the document tokens — catching nuances that separate embeddings miss.

Example: "laptop return policy"

Document	Score	Why
Returns & Refunds page (mentions "laptop")	0.92	Trained to match "return policy" + product
Laptop product page (no return info)	0.45	About laptops, not about returns
General FAQ (mentions returns briefly)	0.38	Related but not specific

Fast

~200ms for 20 documents. Small model (33M params), runs on CPU.

Deterministic

Same input = same score. No temperature or sampling variance.

Surface-level

Matches patterns, doesn't truly understand intent. Struggles with vocabulary mismatch.

Hard to customize

Needs fine-tuning for new domains. Can't be improved with prompting.

Approach 2: LLM Logprob Reranking

Instead of a specialized model, you ask a general-purpose LLM: "Is this document relevant? Yes or No." But instead of reading the text answer, you read the probability behind it — how confident the model is.

How logprobs work

If the model is 95% confident the answer is "Yes," that's a high relevance score. If it's 55% Yes / 45% No, that's marginal. This gives a smooth score from 0.0 to 1.0 — much more useful for ranking than a binary yes/no.

Example: "How do I cancel my subscription?"

Document	Score	Why
Subscription Management ("modify, pause, or end your plan")	0.94	LLM understands "cancel" = "end your plan"
Billing FAQ (mentions "cancel" directly)	0.88	Direct keyword match, less comprehensive
Pricing page (lists all plans)	0.22	Related topic but not about canceling

Deep understanding

"cancel" = "end your plan" = "terminate subscription." Understands intent, not just words.

Works across languages

Cross-encoders are typically English-trained. LLMs handle multilingual queries natively.

Slower

~300ms for 20 documents. Needs LLM API access or GPU.

Slightly variable

Scores can vary slightly across runs due to temperature. Prompt wording affects results.

Hybrid search pipeline showing vector and keyword paths merging through RRF — The full pipeline: vector + keyword search → RRF fusion → reranker

The Exact-Match Problem

Both approaches share a failure mode. Query: "error code E-4021"

What happens

Keyword search finds the exact error page immediately. But the reranker scores a general "Troubleshooting Guide" higher because it has richer context and more vocabulary overlap with "error" concepts.

The fix: position-aware blending

Top results trust the original search score. Lower results trust the reranker. Exact matches stay at the top where they belong.

Position	Search Score	Reranker Score
Top 1–3	75%	25%
4–10	60%	40%
11+	40%	60%

Side-by-Side Comparison

Dimension	Cross-Encoder	LLM Logprob	Winner
Speed (20 docs)	~200ms	~300ms	Cross-Encoder
Understanding depth	Pattern matching	Full reasoning	Logprob LLM
Vocabulary mismatch	Moderate	Strong	Logprob LLM
Simple queries	Great	Overkill	Cross-Encoder
Infrastructure	Lightweight (CPU)	Needs LLM/GPU	Cross-Encoder
Customization	Needs fine-tuning	Prompt engineering	Logprob LLM

Which One Should You Use?

High-volume support

Cross-encoder + position-aware blending. Speed matters, most queries are straightforward.

Complex knowledge (legal, medical)

LLM logprob reranking. Intent reasoning catches domain-specific vocabulary.

Multi-language support

LLM logprob reranking. Cross-encoders are typically English-only.

Mixed query types

Hybrid: cross-encoder by default, escalate to LLM for long or complex queries.

The hybrid approach

Use the cross-encoder as the default (fast, reliable). Escalate to LLM reranking when: query is 10+ words, cross-encoder confidence is below 0.6, or the query contains negation ("not," "without," "except").

Get Started

Start with a cross-encoder

ms-marco-MiniLM or Cohere Rerank. It's the 80/20 solution.

Add position-aware blending

Protect your top results from being demoted. Works immediately.

Measure your baseline

Track dead-end rate and answer quality before changing anything.

Consider LLM reranking

If you see vocabulary mismatch failures — queries that should match but don't.

Try smarter search — free for 3 days