Workflows
Imagine a newsroom that needs to publish breaking stories in five languages within 30 minutes. Their first instinct: build an "autonomous AI journalist"—an agent that researches, writes, fact-checks, and translates on its own. Three months later, they have a system that occasionally produces brilliant articles but more often hallucinates sources, gets stuck in research loops, or translates headlines into nonsense.
Then they try something simpler: a workflow. A fixed pipeline that takes a human-written draft, runs it through a fact-checker, translates it in parallel to all five languages, and queues it for a human editor. No autonomy. No decision-making. Just a reliable conveyor belt. Error rate drops 90%. Time-to-publish falls to 12 minutes.
This isn't a hypothetical—it's the pattern we see repeatedly. Not every AI system needs a brain. Sometimes you just need a well-designed assembly line.
What is a Workflow?
In the previous chapter, we built an autonomous agent—an LLM that decides its own next steps. A workflow is the opposite: you define the path, and the LLM follows it.
Workflows are systems where LLMs are orchestrated through predefined code paths. Agents are systems where LLMs dynamically direct their own processes.
Workflows are predictable. Agents are flexible. Most production AI systems are workflows—less glamorous, but they ship on time and don't surprise you at 3 AM.
The Four Primitives
Workflows are built from four fundamental patterns—the same primitives you'd find in any distributed systems textbook: Sequential, Parallel, Routing, and Looping. Think of them as LEGO bricks. Each is simple on its own, but combined, they can build sophisticated systems.
1. Sequential
The simplest pattern. You execute LLM calls in a strict order, where the output of one step becomes the input for the next. It's a relay race: each runner hands off the baton to the next.
Example: Blog Post Pipeline
- Outline Generator → Takes a topic, outputs a structured outline
- Gate → Does the outline have 3-5 sections? If not, retry.
- Draft Writer → Takes the outline, writes the full post
- Translator → Takes the draft, translates to Spanish
Each step is a focused, single-purpose LLM call. The "Gate" is optional but powerful—it's a programmatic check (or another LLM call) that ensures quality before proceeding.
You're trading latency for accuracy. A single prompt asking an LLM to "write an outlined, polished, translated blog post" will produce worse results than three specialized prompts. Each LLM call is easier when it has one job.
When to use Sequential:
- Tasks that decompose cleanly into distinct subtasks
- When you need quality gates between steps
- When later steps depend on earlier outputs
- Examples: Document generation, multi-stage analysis, content pipelines
2. Parallel
Why wait for things that can happen at the same time? Parallelization runs multiple LLM calls simultaneously and aggregates the results. This pattern manifests in two main forms:
Sectioning
Break a task into independent pieces and run them simultaneously.
Example: Report Generator
- Section 1 LLM: Write the Executive Summary
- Section 2 LLM: Write the Market Analysis
- Section 3 LLM: Write the Financial Projections
- Aggregator: Combine all sections into one document
If each section takes 10 seconds, sectioning reduces total time from 30 seconds to ~10 seconds.
Voting
Run the same prompt multiple times to get diverse outputs, then select the best.
Example: Code Review
- Run the same "Find security vulnerabilities in this code" prompt 5 times
- Each run might catch different issues (LLMs are non-deterministic)
- Aggregate: Flag any vulnerability mentioned by 2+ runs
Voting is especially useful for high-stakes decisions where you want consensus or for catching edge cases that a single pass might miss.
Orchestrator-Workers (Dynamic Parallelization)
A more advanced variant. Instead of you defining what runs in parallel, a central "Orchestrator" LLM analyzes the input and dynamically decides what workers to spawn.
Example: Research Assistant
- User asks: "Compare the AI strategies of Google, Microsoft, and Amazon"
- Orchestrator decides: Spawn 3 workers, one for each company
- Workers run in parallel, each researching one company
- Synthesizer combines findings into a comparative report
When to use Parallel:
- Independent subtasks with no dependencies
- When latency is critical and tasks can be divided
- When you want diverse perspectives or consensus (voting)
- When the number/nature of subtasks isn't known until runtime (orchestrator-workers)
- Examples: Multi-document analysis, guardrail systems (one LLM responds, another checks for safety), bulk content generation
3. Routing
The "traffic cop" of workflows. An initial LLM call (or traditional classifier) examines the input and directs it to a specialized downstream handler. It's the if/else statement of AI systems.
Example: Customer Support System
- Router: "Classify this message: billing, technical, or general?"
- Billing Handler: Specialized prompt trained on refund policies, payment issues
- Tech Support Handler: Specialized prompt with access to troubleshooting docs
- General Handler: Simple FAQ retrieval
Each handler can have its own tools, context, and even model. The billing handler might use GPT-4 for nuanced policy interpretation, while the FAQ handler uses a cheaper, faster model.
Routing isn't just about specialized prompts—it's also about cost optimization. Route simple queries to fast, cheap models (Gemini Flash, Claude Haiku) and complex ones to powerful models (GPT-4, Claude Opus). Your average cost per query drops significantly.
When to use Routing:
- Distinct categories of inputs requiring different handling
- When optimizing for one category would hurt another
- Cost optimization (route by complexity)
- Examples: Customer support, content moderation, multi-tenant systems
4. Looping
Sometimes one pass isn't enough. The Evaluator-Optimizer pattern creates a feedback loop: one LLM generates output, another evaluates it, and if it doesn't meet criteria, the generator tries again.
Example: Essay Polisher
- Generator: Write an essay on climate change
- Evaluator: "Rate this essay 1-10 on clarity, accuracy, and engagement. If below 8, provide specific feedback."
- Loop: If score < 8, feed the essay + feedback back to the Generator
- Exit: When score ≥ 8, output the final essay
The key insight: LLMs are often better at critiquing than creating. The evaluator catches issues the generator missed, and the feedback gives the generator specific direction for improvement.
Always cap your loops. Without a limit, a perfectionist evaluator and a struggling generator will argue forever—like two coworkers debating font choices while burning through your API budget. Three to five iterations is usually enough. If it's not good by then, a human needs to step in.
When to use Looping:
- Tasks where iterative refinement provides measurable improvement
- When you have clear, evaluable criteria (scores, checklists)
- When human feedback would help, but you want to automate the first few rounds
- Examples: Code generation with test validation, content editing, translation refinement
Composing Primitives
These four patterns don't exist in isolation. Real systems combine them. Let's look at a practical example.
Example: Automated Code Review System
Consider a system that triages and fixes GitHub issues. It uses all four primitives:
See the primitives at work?
- Sequential: Issue → Parse → Classify → Fix → Test → Merge
- Routing: Complexity classifier dispatches to different handlers
- Parallel: Multiple test suites could run simultaneously (not shown)
- Looping: Failed tests trigger another fix attempt
This is how production systems are built. You don't pick one pattern—you compose them based on the problem.
When to Choose Workflows vs. Agents
Now the practical question: when do you reach for a workflow versus an autonomous agent?
The Core Tradeoff
Workflows give you control. You know exactly what will happen, in what order, and you can debug each step. But they're brittle—if the input doesn't match your expected patterns, the workflow fails.
Agents give you adaptability. They can handle unexpected situations, recover from errors, and find creative solutions. But they're unpredictable—you can't guarantee what they'll do, and debugging emergent behavior is hard.
Decision Guide
| Choose Workflow when... | Choose Agent when... |
|---|---|
| Task has clear, repeatable steps | Task is open-ended or exploratory |
| You can enumerate input categories | You can't predict what steps are needed |
| Predictability matters (finance, healthcare) | Flexibility matters more than consistency |
| You need to optimize cost and latency | Recovery from errors is critical |
| Failures should be loud and obvious | You have strong guardrails in place |
The Hybrid Reality
There's a reason workflows dominated the early days of LLM applications. In 2022-2023, models struggled with multi-step reasoning and reliable tool use. If you asked an early model to "research a topic, draft an article, fact-check it, then translate it," you'd get chaos. The only way to get reliable results was to break tasks into small, predictable steps—workflows.
But the landscape has shifted. Today's frontier models have dramatically better reasoning and tool-use capabilities. They can maintain coherent plans over many steps, recover from errors, and make sensible decisions about what to do next.
The result? The industry is moving toward hybrids. Workflows handle the predictable parts—input validation, output formatting, quality checks. Agents handle the parts that need flexibility—research, problem-solving, exploration. Humans stay in the loop for high-stakes decisions.
Start with workflows. Inject agent-like autonomy only where the task genuinely requires flexibility—and where you have guardrails to catch failures.
Summary
Workflows trade the glamour of autonomous agents for something more valuable: reliability.
The four primitives:
- Sequential — Chain calls, output feeds input
- Parallel — Run simultaneously, aggregate results
- Routing — Classify and dispatch to specialists
- Looping — Generate, evaluate, refine
These compose into sophisticated systems. And while workflows dominated early LLM applications (because models couldn't reason well enough for autonomy), the balance is shifting. As models get smarter, expect to see more hybrids—workflows providing structure and guardrails, agents providing flexibility where it matters.
Next, we'll explore what happens when you need multiple agents working together—and how to coordinate them into agentic systems.