Planning & Reasoning Loops
Access to tools (Chapter 4) is necessary but not sufficient. An agent needs a Brain—a loop that directs when to use tools and how to combine them to solve complex goals.
The ReAct Pattern
Reasoning + Acting. This is the baseline architecture for modern agents.
Instead of just acting:
Action:
search(query)
The agent is forced to think first:
Thought: "The user wants to know the age of the President. First I need to find out who the President is, then find their age." Action:
search("current US President")Observation: "Joe Biden" Thought: "Now I search for his age." Action:search("Joe Biden age")
This inner monologue grounds the model and prevents impulsive hallucination.
Plan-and-Solve (Chain of Thought on Steroids)
For multi-step tasks, ReAct can sometimes get "lost in the weeds" of immediate steps and lose sight of the overall goal.
Plan-and-Solve creates an explicit plan before execution starts.
- Planner: "Break this goal down into steps."
- Step 1: Fetch data X.
- Step 2: Analyze data Y.
- Step 3: Generate report.
- Executor: Executes Step 1.
- Executor: Executes Step 2...
This separation of concerns (Planning vs. Execution) improves reliability on long-horizon tasks.
Self-Correction & Reflection
Models make mistakes. A robust agent includes a feedback loop to catch them.
Reflection Pattern:
- Agent produces an output (code, plan, answer).
- Reflector (Prompts as "Critic"): "Review the above output for errors, logical fallacies, or security issues."
- Agent receives the critique and regenerates the output.
// The Reflection Loop
let draft = await generateCode(prompt);
let critique = await critiqueCode(draft);
if (critique.hasErrors) {
draft = await fixCode(draft, critique.feedback);
}
return draft;This is remarkably effective for code generation, where the "Critic" can even include the output of a real compiler or linter.
Infinite Loops & Stopping Conditions
Automated planning loops are dangerous. They can get stuck in infinite retries ("I failed, let me try again exactly the same way").
Safety Guards:
- Max Steps: Hard limit (e.g., 10 steps). If not solved, abort.
- Breadth Heuristic: If the agent tries the same tool with the exact same arguments twice, force a stop or a Strategy change.
- Human Interrupt: Always allow the user to see the plan and say "Stop" or "Edit Plan" before execution continues.
Summary
- ReAct interweaves thinking and acting for dynamic problem solving.
- Plan-and-Solve helps maintain focus on long tasks.
- Reflection loops allow the agent to fix its own mistakes before the user sees them.
These loops are the "cognitive architecture" of your agent.