Planning & Reasoning Loops

Access to tools (Chapter 4) is necessary but not sufficient. An agent needs a Brain—a loop that directs when to use tools and how to combine them to solve complex goals.

The ReAct Pattern

Reasoning + Acting. This is the baseline architecture for modern agents.

Instead of just acting:

Action: search(query)

The agent is forced to think first:

Thought: "The user wants to know the age of the President. First I need to find out who the President is, then find their age." Action: search("current US President") Observation: "Joe Biden" Thought: "Now I search for his age." Action: search("Joe Biden age")

This inner monologue grounds the model and prevents impulsive hallucination.

Plan-and-Solve (Chain of Thought on Steroids)

For multi-step tasks, ReAct can sometimes get "lost in the weeds" of immediate steps and lose sight of the overall goal.

Plan-and-Solve creates an explicit plan before execution starts.

Planner: "Break this goal down into steps."
- Step 1: Fetch data X.
- Step 2: Analyze data Y.
- Step 3: Generate report.
Executor: Executes Step 1.
Executor: Executes Step 2...

This separation of concerns (Planning vs. Execution) improves reliability on long-horizon tasks.

Self-Correction & Reflection

Models make mistakes. A robust agent includes a feedback loop to catch them.

Reflection Pattern:

Agent produces an output (code, plan, answer).
Reflector (Prompts as "Critic"): "Review the above output for errors, logical fallacies, or security issues."
Agent receives the critique and regenerates the output.

// The Reflection Loop
let draft = await generateCode(prompt);
let critique = await critiqueCode(draft);
 
if (critique.hasErrors) {
  draft = await fixCode(draft, critique.feedback);
}
return draft;

This is remarkably effective for code generation, where the "Critic" can even include the output of a real compiler or linter.

Infinite Loops & Stopping Conditions

Automated planning loops are dangerous. They can get stuck in infinite retries ("I failed, let me try again exactly the same way").

Safety Guards:

Max Steps: Hard limit (e.g., 10 steps). If not solved, abort.
Breadth Heuristic: If the agent tries the same tool with the exact same arguments twice, force a stop or a Strategy change.
Human Interrupt: Always allow the user to see the plan and say "Stop" or "Edit Plan" before execution continues.

Summary

ReAct interweaves thinking and acting for dynamic problem solving.
Plan-and-Solve helps maintain focus on long tasks.
Reflection loops allow the agent to fix its own mistakes before the user sees them.

These loops are the "cognitive architecture" of your agent.