If the answer lives in a database, API, or validated report, the model should not "know" it from memory.
This sounds obvious, but many teams still build structured-data QA as if every question-answering task were generic RAG. That is how you get fluent answers to the wrong number.
Grounding means the answer is constrained by evidence
For structured-data QA, the safest pattern is:
- fetch deterministically
- explain second
The model can help map intent to a query or tool call, but the system of record should still provide the facts.
Choose the grounding path based on the source
| Source of truth | Best pattern | Why |
|---|---|---|
| Warehouse or database | SQL or query template path | The data is structured and queryable |
| Business system or API | Tool or function calling | The answer should come from live system data |
| Reports or policies | Retrieval with citations | The evidence is document-based |
| Mixed sources | Router plus grounded path per source | One answer path does not fit all source types |
A concrete text-to-SQL safety pattern
Suppose a user asks, "What was AWS spend last week?"
The model can help convert that to a query, but you should still validate the query before execution.
export async function answerSpendQuestion(question: string) {
const draftSql = await generateSql(question);
validateSql(draftSql, {
allowedTables: ["rollup_daily"],
allowWriteStatements: false,
});
const rows = await runReadOnlyQuery(draftSql);
return summarizeRows(question, rows);
}
The point is not that text-to-SQL is impossible. The point is that it needs guardrails:
- allowed tables
- read-only enforcement
- schema-aware validation
- answer synthesis after execution
If the answer should come from the warehouse, fetch it from the warehouse.
Tool-based QA is often better than generic RAG
Many common questions are really tool questions:
- what is the order status?
- which jobs failed?
- when was this invoice paid?
- what limit applies to this account?
Those should call systems of record directly. Retrieval over docs is helpful for policies and explanations, not for live account state.
Mixed answers need split responsibilities
Some questions need both raw values and explanation:
- "Why did spend rise last week?"
- "How many failed jobs did we have, and what changed?"
In those cases:
- fetch the values deterministically
- optionally fetch document or policy context
- let the model explain based on those grounded inputs
That is much safer than asking the model to infer the facts and the explanation in one step.
What to evaluate
For grounded QA, track:
- answer correctness
- evidence correctness
- unsupported-claim rate
- query or tool success rate
If the system sounds helpful but the answer is wrong, it is not production-ready.
A fill-in grounding spec
Use this before shipping:
Workflow:
System of record:
Is the answer live, historical, or policy-based?
What should be fetched deterministically?
What should the model only explain?
What evidence should be shown back to the user?
Primary metric:
Guardrail metric:
This prevents the most common failure: using one answer path for data that clearly belongs to different sources.
The common failure mode
The common failure mode is using RAG for everything because it feels like a general solution.
RAG is useful for:
- policies
- documentation
- narrative reports
It is not the best default for:
- live metrics
- account records
- operational statuses
If the answer is sitting in a table or API, route through the table or API.
How StackSpend helps
Grounded QA has distinct cost layers across model usage, query execution, and retrieval. In StackSpend, you can separate those layers by workflow, see whether one grounded assistant is overusing model explanation relative to cheap deterministic fetches, and watch cost per successful grounded answer as usage grows.
What to do next
- Hybrid search and reranking patterns for RAG
- Agentic tool-use patterns: planner, executor, and recovery
FAQ
When should I use text-to-SQL instead of RAG?
Use text-to-SQL when the answer belongs to structured tables and the user needs current or queryable facts, not just document explanations.
Should the model execute arbitrary SQL?
No. Use read-only enforcement, allowed-table validation, and preferably templates or validators that constrain what can run.
Can I combine SQL and retrieval in one answer?
Yes. Fetch the facts deterministically first, then add retrieved policy or documentation context if the user needs explanation.
What is the biggest risk in grounded QA?
Letting the model synthesize an answer that mixes grounded facts with unsupported claims. That is why evidence correctness matters alongside answer correctness.
Is retrieval still useful for structured-data QA?
Yes, but usually for surrounding narrative context such as policy, definitions, and documentation rather than for the live values themselves.