Not every retrieval failure is an embedding failure.
Sometimes the real problem is earlier: the query was vague, contained multiple asks, or should have gone to a completely different source. That is why strong RAG systems often have a pre-retrieval layer that rewrites, decomposes, or routes the query before any search happens.
Three different problems, three different fixes
| Problem | Best first move | Why |
|---|---|---|
| Query is vague or uses mismatched language | Rewrite | Improve retrievability without changing intent |
| Query contains multiple asks | Decompose | Lets each sub-question retrieve cleaner evidence |
| Query belongs to different corpora or tools | Route | Different sources need different retrieval methods |
The practical rule is to fix the query path before you spend more tokens on bigger prompts.
A concrete pre-retrieval pipeline
Here is a simple TypeScript sketch:
export async function prepareQuery(userQuery: string) {
const queryShape = await classifyQueryShape(userQuery);
if (queryShape === "multi_part") {
return {
route: "decompose",
queries: await decomposeQuery(userQuery),
};
}
if (queryShape === "structured_data_question") {
return {
route: "tool",
queries: [userQuery],
};
}
if (queryShape === "vague_or_slangy") {
return {
route: "rewrite_then_search",
queries: [await rewriteQuery(userQuery)],
};
}
return {
route: "direct_search",
queries: [userQuery],
};
}
This is useful because it separates three decisions clearly:
- should the query be normalized?
- should it be split?
- where should it go?
Once those decisions are explicit, you can evaluate them independently.
Rewrite only when the wording is the problem
Rewriting helps when the query is:
- informal
- shorthand-heavy
- missing domain terms
- phrased differently from the indexed corpus
Good rewrite behavior preserves intent while improving searchability.
Bad rewrite behavior guesses what the answer should be or adds specificity the user did not provide.
If filters, metadata, or better chunking would solve the issue, rewriting is not the first fix.
Decomposition helps when the user really asked two questions
A single query such as:
- "What changed in pricing and what does it mean for enterprise customers?"
actually contains at least two retrieval jobs:
- what changed
- what the impact is for a specific segment
If you keep those combined, retrieval often mixes weak evidence from both. Decomposition lets the system search and answer in smaller, cleaner parts.
Routing matters when your sources differ
A mature assistant usually has multiple destinations:
- product docs
- policy docs
- support tickets
- metrics dashboards
- SQL-backed systems
These should not all share the same retrieval path. Some need hybrid search. Some need a tool call. Some need SQL, not RAG.
Routing is often the most valuable part of the pre-retrieval layer because it stops the wrong engine from answering the question.
What to measure
Track:
- rewrite lift on retrieval metrics
- decomposition lift on recall or answer correctness
- routing accuracy
- answer correctness after routing
If you only look at final answer quality, you will not know whether the improvement came from better query prep or better generation.
A practical worksheet
Use this before implementing:
Workflow:
Common query shapes:
Which shapes need rewriting:
Which shapes need decomposition:
Which shapes should route to tools or SQL:
Which shapes should use document retrieval:
Primary metric:
Guardrail metric:
That gives you a real pre-retrieval design instead of an intuitive guess.
The common anti-pattern
The common anti-pattern is rewriting every query because a query-rewrite step feels smart.
That often creates two new problems:
- extra cost on easy queries
- accidental drift away from the user's real wording
The better pattern is selective rewriting, selective decomposition, and explicit routing based on query shape.
How StackSpend helps
Pre-retrieval layers change spend by adding or removing search steps, tool calls, and generation retries. In StackSpend, you can compare cost per answered query before and after a routing layer launch, see whether rewrite-heavy traffic is actually improving outcomes, and spot when a "smarter" pre-retrieval design is adding cost without improving retrieval quality enough to justify it.
What to do next
FAQ
Should I rewrite every query automatically?
Usually no. Rewrite only when the wording is vague, mismatched to the corpus, or clearly likely to retrieve poorly.
When should I decompose a query?
When one user message contains multiple retrieval intents that are likely to require separate evidence sets.
How do I know a query should route to a tool instead of retrieval?
If the answer belongs to a live system of record such as SQL, an API, or account state, a tool path is usually safer than document retrieval.
What is the easiest mistake to make with query rewriting?
Accidentally changing user intent while trying to make the query more searchable.
Is pre-retrieval work worth the extra complexity?
Often yes, because it can improve answer quality more cheaply than increasing prompt size or switching to a larger generation model.