You probably do not need an agent.
That sounds dismissive, but it is the most useful starting point for this topic. Most teams who say "agent" really mean one of three simpler things:
- a fixed workflow with one or two tool calls
- a router that picks between a few known flows
- a planner that decomposes a task while an executor performs the steps
The reason this matters is practical. Open-ended loops are expensive, hard to test, and easy to let run past the point where a person should step in.
Start with the lightest viable control pattern
| Pattern | Use when | Why it wins |
|---|---|---|
| Fixed workflow | The steps are known in advance | Lowest risk, easiest to test |
| Router plus specialists | A few task types need different paths | Keeps logic explicit while still flexible |
| Planner plus executor | The workflow needs decomposition or dynamic tool choice | Lets you separate thinking from doing |
| Open-ended agent loop | The environment is highly variable and recovery is well-designed | Only justified in narrower cases than most teams think |
If you can enumerate the steps in code, do that first. The model does not need to "discover" a workflow that your team already understands.
The production issue is usually recovery, not planning
Teams spend a lot of energy on planner prompts and not enough on what happens when tools fail. But production pain usually comes from:
- malformed tool arguments
- timeouts
- bad permissions
- contradictory retrieved evidence
- repeated retries that do not add new information
That is why the core design question is not "how smart is the planner?" It is "what happens when the first tool call fails?"
A concrete planner-executor-recovery loop
Here is a compact TypeScript example of a state-machine-based agent loop. The state machine logic is model-agnostic — it works the same whether the underlying planner uses OpenAI, Anthropic, or any other provider's tool-calling API:
type RunState =
| "success"
| "retry_once"
| "fallback"
| "ask_user"
| "escalate";
type StepResult = {
ok: boolean;
retryable: boolean;
canFallback: boolean;
missingInput?: string;
output?: unknown;
};
export async function runWorkflow(task: string) {
const plan = await createPlan(task);
let retries = 0;
for (const step of plan.steps) {
const result = await executeStep(step);
const nextState = decideRecoveryState(result, retries);
if (nextState === "success") {
continue;
}
if (nextState === "retry_once") {
retries += 1;
const retryResult = await executeStep(step);
if (!retryResult.ok) {
return escalate(step, retryResult);
}
continue;
}
if (nextState === "fallback") {
const fallbackResult = await executeFallback(step);
if (!fallbackResult.ok) {
return escalate(step, fallbackResult);
}
continue;
}
if (nextState === "ask_user") {
return requestClarification(step, result.missingInput ?? "missing input");
}
return escalate(step, result);
}
return { status: "completed" };
}
function decideRecoveryState(result: StepResult, retries: number): RunState {
if (result.ok) return "success";
if (result.retryable && retries === 0) return "retry_once";
if (result.canFallback) return "fallback";
if (result.missingInput) return "ask_user";
return "escalate";
}
This is intentionally less magical than many demo agents. That is a good thing. The workflow is explicit enough that you can inspect why it retried, why it fell back, and when it stopped.
Keep the planner away from direct action
A useful control boundary is:
- planner proposes steps
- executor calls tools
- validator checks outputs
- recovery layer decides next action
Do not let the planner directly fire tools if you care about observability or safety. When the same component both thinks and acts, debugging becomes much harder.
What a good tool contract looks like
Tool use works best when each tool has:
- a narrow purpose
- explicit input validation
- structured output
- clear failure modes
Bad tool design is vague: "lookup_account" that might search, edit, infer, or summarize.
Good tool design is narrow: "get_invoice_by_id" or "search_policy_documents."
The narrower the contract, the easier it is to evaluate tool selection and recovery quality.
When a router is enough
Many "agentic" products are really routers with a few specialized paths:
- refund request -> billing workflow
- account change -> identity workflow
- policy question -> retrieval workflow
That is often enough to create the feel of autonomy without taking on the complexity of a long-running loop.
The wrong reason to add an agent loop is that a fixed workflow looks less impressive in a product demo.
Metrics that matter more than vibe
Evaluate agentic systems as workflows, not personalities.
Track:
- task success rate
- tool success rate
- recovery success rate
- escalation correctness
- average step count
- cost per successful task
If an "agent upgrade" improves completion by 3% but doubles average step count and review load, the economics may have gotten worse.
A practical autonomy worksheet
Use this before you build:
Workflow:
Are the steps mostly known in advance?
Are tool contracts stable?
What actions are high risk?
Which failures are safe to retry?
Which fallback path exists?
When must a human take over?
Maximum allowed loop depth:
Primary metric:
Guardrail metric:
If you cannot answer the retry and escalation questions, the workflow is not ready for more autonomy.
The common anti-pattern
The common anti-pattern is "let the model keep trying."
That usually means:
- token spend keeps climbing
- the same error repeats
- no new evidence is added
- the system looks active but is not making progress
A small explicit recovery state machine beats an unconstrained loop in most business workflows.
How StackSpend helps
Agentic workflows hide spend inside retries, extra tool calls, and overlong sessions. The Data Explorer lets you filter by provider to compare average cost per session before and after a planner rollout, so "autonomous" does not become a synonym for more billable steps. The Monitoring view surfaces anomalies quickly — if a retry loop starts hammering an API after a deployment, you get a spend alert rather than discovering the problem in the next billing cycle. Setting a Budget on your agentic workflow service gives you a hard ceiling that triggers before a misbehaving agent runs up an unexpected bill.
What to do next
FAQ
When do I actually need a planner?
Use a planner when the workflow cannot be fully enumerated in advance and the system genuinely needs to decompose the task or choose among tools dynamically.
Should the planner call tools directly?
Usually no. Keeping planning separate from execution makes failures easier to reason about and safer to control.
How many retries should an agent get?
Usually fewer than demo videos suggest. In many production workflows, one retry plus one fallback path is a better design than repeated looping.
What is the best signal that an agent is over-designed?
If a fixed workflow or router can already solve the job, an open-ended loop is usually unnecessary complexity.
How do I know if recovery is working?
Look at recovery success rate, escalation correctness, and cost per successful task. If retries are rising without better outcomes, the recovery design is weak.