Planning & Reasoning Loops

Access to tools (Chapter 4) is necessary but not sufficient. An agent needs a Brain—a loop that directs when to use tools and how to combine them to solve complex goals.

The ReAct Pattern

Reasoning + Acting. This is the baseline architecture for modern agents.

Instead of just acting:

Action: search(query)

The agent is forced to think first:

Thought: "The user wants to know the age of the President. First I need to find out who the President is, then find their age." Action: search("current US President") Observation: "Joe Biden" Thought: "Now I search for his age." Action: search("Joe Biden age")

This inner monologue grounds the model and prevents impulsive hallucination.

Plan-and-Solve (Chain of Thought on Steroids)

For multi-step tasks, ReAct can sometimes get "lost in the weeds" of immediate steps and lose sight of the overall goal.

Plan-and-Solve creates an explicit plan before execution starts.

  1. Planner: "Break this goal down into steps."
    • Step 1: Fetch data X.
    • Step 2: Analyze data Y.
    • Step 3: Generate report.
  2. Executor: Executes Step 1.
  3. Executor: Executes Step 2...

This separation of concerns (Planning vs. Execution) improves reliability on long-horizon tasks.

Self-Correction & Reflection

Models make mistakes. A robust agent includes a feedback loop to catch them.

Reflection Pattern:

  1. Agent produces an output (code, plan, answer).
  2. Reflector (Prompts as "Critic"): "Review the above output for errors, logical fallacies, or security issues."
  3. Agent receives the critique and regenerates the output.
// The Reflection Loop
let draft = await generateCode(prompt);
let critique = await critiqueCode(draft);
 
if (critique.hasErrors) {
  draft = await fixCode(draft, critique.feedback);
}
return draft;

This is remarkably effective for code generation, where the "Critic" can even include the output of a real compiler or linter.

Infinite Loops & Stopping Conditions

Automated planning loops are dangerous. They can get stuck in infinite retries ("I failed, let me try again exactly the same way").

Safety Guards:

  1. Max Steps: Hard limit (e.g., 10 steps). If not solved, abort.
  2. Breadth Heuristic: If the agent tries the same tool with the exact same arguments twice, force a stop or a Strategy change.
  3. Human Interrupt: Always allow the user to see the plan and say "Stop" or "Edit Plan" before execution continues.

Summary

  • ReAct interweaves thinking and acting for dynamic problem solving.
  • Plan-and-Solve helps maintain focus on long tasks.
  • Reflection loops allow the agent to fix its own mistakes before the user sees them.

These loops are the "cognitive architecture" of your agent.