Agentic tool-use patterns: planner, executor, and recovery

You probably do not need an agent.

That sounds dismissive, but it is the most useful starting point for this topic. Most teams who say "agent" really mean one of three simpler things:

a fixed workflow with one or two tool calls
a router that picks between a few known flows
a planner that decomposes a task while an executor performs the steps

The reason this matters is practical. Open-ended loops are expensive, hard to test, and easy to let run past the point where a person should step in.

Start with the lightest viable control pattern

Pattern	Use when	Why it wins
Fixed workflow	The steps are known in advance	Lowest risk, easiest to test
Router plus specialists	A few task types need different paths	Keeps logic explicit while still flexible
Planner plus executor	The workflow needs decomposition or dynamic tool choice	Lets you separate thinking from doing
Open-ended agent loop	The environment is highly variable and recovery is well-designed	Only justified in narrower cases than most teams think

If you can enumerate the steps in code, do that first. The model does not need to "discover" a workflow that your team already understands.

The production issue is usually recovery, not planning

Teams spend a lot of energy on planner prompts and not enough on what happens when tools fail. But production pain usually comes from:

malformed tool arguments
timeouts
bad permissions
contradictory retrieved evidence
repeated retries that do not add new information

That is why the core design question is not "how smart is the planner?" It is "what happens when the first tool call fails?"

A concrete planner-executor-recovery loop

Here is a compact TypeScript example of a state-machine-based agent loop. The state machine logic is model-agnostic — it works the same whether the underlying planner uses OpenAI, Anthropic, or any other provider's tool-calling API:

type RunState =
  | "success"
  | "retry_once"
  | "fallback"
  | "ask_user"
  | "escalate";

type StepResult = {
  ok: boolean;
  retryable: boolean;
  canFallback: boolean;
  missingInput?: string;
  output?: unknown;
};

export async function runWorkflow(task: string) {
  const plan = await createPlan(task);
  let retries = 0;

  for (const step of plan.steps) {
    const result = await executeStep(step);
    const nextState = decideRecoveryState(result, retries);

    if (nextState === "success") {
      continue;
    }

    if (nextState === "retry_once") {
      retries += 1;
      const retryResult = await executeStep(step);
      if (!retryResult.ok) {
        return escalate(step, retryResult);
      }
      continue;
    }

    if (nextState === "fallback") {
      const fallbackResult = await executeFallback(step);
      if (!fallbackResult.ok) {
        return escalate(step, fallbackResult);
      }
      continue;
    }

    if (nextState === "ask_user") {
      return requestClarification(step, result.missingInput ?? "missing input");
    }

    return escalate(step, result);
  }

  return { status: "completed" };
}

function decideRecoveryState(result: StepResult, retries: number): RunState {
  if (result.ok) return "success";
  if (result.retryable && retries === 0) return "retry_once";
  if (result.canFallback) return "fallback";
  if (result.missingInput) return "ask_user";
  return "escalate";
}

This is intentionally less magical than many demo agents. That is a good thing. The workflow is explicit enough that you can inspect why it retried, why it fell back, and when it stopped.

Keep the planner away from direct action

A useful control boundary is:

planner proposes steps
executor calls tools
validator checks outputs
recovery layer decides next action

Do not let the planner directly fire tools if you care about observability or safety. When the same component both thinks and acts, debugging becomes much harder.

What a good tool contract looks like

Tool use works best when each tool has:

a narrow purpose
explicit input validation
structured output
clear failure modes

Bad tool design is vague: "lookup_account" that might search, edit, infer, or summarize.

Good tool design is narrow: "get_invoice_by_id" or "search_policy_documents."

The narrower the contract, the easier it is to evaluate tool selection and recovery quality.

When a router is enough

Many "agentic" products are really routers with a few specialized paths:

refund request -> billing workflow
account change -> identity workflow
policy question -> retrieval workflow

That is often enough to create the feel of autonomy without taking on the complexity of a long-running loop.

The wrong reason to add an agent loop is that a fixed workflow looks less impressive in a product demo.

Metrics that matter more than vibe

Evaluate agentic systems as workflows, not personalities.

Track:

task success rate
tool success rate
recovery success rate
escalation correctness
average step count
cost per successful task

If an "agent upgrade" improves completion by 3% but doubles average step count and review load, the economics may have gotten worse.

A practical autonomy worksheet

Use this before you build:

Workflow:
Are the steps mostly known in advance?
Are tool contracts stable?
What actions are high risk?
Which failures are safe to retry?
Which fallback path exists?
When must a human take over?
Maximum allowed loop depth:
Primary metric:
Guardrail metric:

If you cannot answer the retry and escalation questions, the workflow is not ready for more autonomy.

The common anti-pattern

The common anti-pattern is "let the model keep trying."

That usually means:

token spend keeps climbing
the same error repeats
no new evidence is added
the system looks active but is not making progress

A small explicit recovery state machine beats an unconstrained loop in most business workflows.

How StackSpend helps

Agentic workflows hide spend inside retries, extra tool calls, and overlong sessions. The Data Explorer lets you filter by provider to compare average cost per session before and after a planner rollout, so "autonomous" does not become a synonym for more billable steps. The Monitoring view surfaces anomalies quickly — if a retry loop starts hammering an API after a deployment, you get a spend alert rather than discovering the problem in the next billing cycle. Setting a Budget on your agentic workflow service gives you a hard ceiling that triggers before a misbehaving agent runs up an unexpected bill.

What to do next

FAQ

When do I actually need a planner?

Use a planner when the workflow cannot be fully enumerated in advance and the system genuinely needs to decompose the task or choose among tools dynamically.

Should the planner call tools directly?

Usually no. Keeping planning separate from execution makes failures easier to reason about and safer to control.

How many retries should an agent get?

Usually fewer than demo videos suggest. In many production workflows, one retry plus one fallback path is a better design than repeated looping.

What is the best signal that an agent is over-designed?

If a fixed workflow or router can already solve the job, an open-ended loop is usually unnecessary complexity.

How do I know if recovery is working?

Look at recovery success rate, escalation correctness, and cost per successful task. If retries are rising without better outcomes, the recovery design is weak.