Workflows

Imagine a newsroom that needs to publish breaking stories in five languages within 30 minutes. Their first instinct: build an "autonomous AI journalist"—an agent that researches, writes, fact-checks, and translates on its own. Three months later, they have a system that occasionally produces brilliant articles but more often hallucinates sources, gets stuck in research loops, or translates headlines into nonsense.

Then they try something simpler: a workflow. A fixed pipeline that takes a human-written draft, runs it through a fact-checker, translates it in parallel to all five languages, and queues it for a human editor. No autonomy. No decision-making. Just a reliable conveyor belt. Error rate drops 90%. Time-to-publish falls to 12 minutes.

This isn't a hypothetical—it's the pattern we see repeatedly. Not every AI system needs a brain. Sometimes you just need a well-designed assembly line.

What is a Workflow?

In the previous chapter, we built an autonomous agent—an LLM that decides its own next steps. A workflow is the opposite: you define the path, and the LLM follows it.

Workflows are systems where LLMs are orchestrated through predefined code paths. Agents are systems where LLMs dynamically direct their own processes.

Workflows are predictable. Agents are flexible. Most production AI systems are workflows—less glamorous, but they ship on time and don't surprise you at 3 AM.

The Four Primitives

Workflows are built from four fundamental patterns—the same primitives you'd find in any distributed systems textbook: Sequential, Parallel, Routing, and Looping. Think of them as LEGO bricks. Each is simple on its own, but combined, they can build sophisticated systems.

1. Sequential

The simplest pattern. You execute LLM calls in a strict order, where the output of one step becomes the input for the next. It's a relay race: each runner hands off the baton to the next.

Example: Blog Post Pipeline

Outline Generator → Takes a topic, outputs a structured outline
Gate → Does the outline have 3-5 sections? If not, retry.
Draft Writer → Takes the outline, writes the full post
Translator → Takes the draft, translates to Spanish

Each step is a focused, single-purpose LLM call. The "Gate" is optional but powerful—it's a programmatic check (or another LLM call) that ensures quality before proceeding.

Why Chain Instead of One Big Prompt?

You're trading latency for accuracy. A single prompt asking an LLM to "write an outlined, polished, translated blog post" will produce worse results than three specialized prompts. Each LLM call is easier when it has one job.

When to use Sequential:

Tasks that decompose cleanly into distinct subtasks
When you need quality gates between steps
When later steps depend on earlier outputs
Examples: Document generation, multi-stage analysis, content pipelines

2. Parallel

Why wait for things that can happen at the same time? Parallelization runs multiple LLM calls simultaneously and aggregates the results. This pattern manifests in two main forms:

Sectioning

Break a task into independent pieces and run them simultaneously.

Example: Report Generator

Section 1 LLM: Write the Executive Summary
Section 2 LLM: Write the Market Analysis
Section 3 LLM: Write the Financial Projections
Aggregator: Combine all sections into one document

If each section takes 10 seconds, sectioning reduces total time from 30 seconds to ~10 seconds.

Voting

Run the same prompt multiple times to get diverse outputs, then select the best.

Example: Code Review

Run the same "Find security vulnerabilities in this code" prompt 5 times
Each run might catch different issues (LLMs are non-deterministic)
Aggregate: Flag any vulnerability mentioned by 2+ runs

Voting is especially useful for high-stakes decisions where you want consensus or for catching edge cases that a single pass might miss.

Orchestrator-Workers (Dynamic Parallelization)

A more advanced variant. Instead of you defining what runs in parallel, a central "Orchestrator" LLM analyzes the input and dynamically decides what workers to spawn.

Example: Research Assistant

User asks: "Compare the AI strategies of Google, Microsoft, and Amazon"
Orchestrator decides: Spawn 3 workers, one for each company
Workers run in parallel, each researching one company
Synthesizer combines findings into a comparative report

When to use Parallel:

Independent subtasks with no dependencies
When latency is critical and tasks can be divided
When you want diverse perspectives or consensus (voting)
When the number/nature of subtasks isn't known until runtime (orchestrator-workers)
Examples: Multi-document analysis, guardrail systems (one LLM responds, another checks for safety), bulk content generation

3. Routing

The "traffic cop" of workflows. An initial LLM call (or traditional classifier) examines the input and directs it to a specialized downstream handler. It's the if/else statement of AI systems.

Example: Customer Support System

Router: "Classify this message: billing, technical, or general?"
Billing Handler: Specialized prompt trained on refund policies, payment issues
Tech Support Handler: Specialized prompt with access to troubleshooting docs
General Handler: Simple FAQ retrieval

Each handler can have its own tools, context, and even model. The billing handler might use GPT-4 for nuanced policy interpretation, while the FAQ handler uses a cheaper, faster model.

Route to Different Models

Routing isn't just about specialized prompts—it's also about cost optimization. Route simple queries to fast, cheap models (Gemini Flash, Claude Haiku) and complex ones to powerful models (GPT-4, Claude Opus). Your average cost per query drops significantly.

When to use Routing:

Distinct categories of inputs requiring different handling
When optimizing for one category would hurt another
Cost optimization (route by complexity)
Examples: Customer support, content moderation, multi-tenant systems

4. Looping

Sometimes one pass isn't enough. The Evaluator-Optimizer pattern creates a feedback loop: one LLM generates output, another evaluates it, and if it doesn't meet criteria, the generator tries again.

Example: Essay Polisher

Generator: Write an essay on climate change
Evaluator: "Rate this essay 1-10 on clarity, accuracy, and engagement. If below 8, provide specific feedback."
Loop: If score < 8, feed the essay + feedback back to the Generator
Exit: When score ≥ 8, output the final essay

The key insight: LLMs are often better at critiquing than creating. The evaluator catches issues the generator missed, and the feedback gives the generator specific direction for improvement.

Set a Max Iterations

Always cap your loops. Without a limit, a perfectionist evaluator and a struggling generator will argue forever—like two coworkers debating font choices while burning through your API budget. Three to five iterations is usually enough. If it's not good by then, a human needs to step in.

When to use Looping:

Tasks where iterative refinement provides measurable improvement
When you have clear, evaluable criteria (scores, checklists)
When human feedback would help, but you want to automate the first few rounds
Examples: Code generation with test validation, content editing, translation refinement

Composing Primitives

These four patterns don't exist in isolation. Real systems combine them. Let's look at a practical example.

Example: Automated Code Review System

Consider a system that triages and fixes GitHub issues. It uses all four primitives:

See the primitives at work?

Sequential: Issue → Parse → Classify → Fix → Test → Merge
Routing: Complexity classifier dispatches to different handlers
Parallel: Multiple test suites could run simultaneously (not shown)
Looping: Failed tests trigger another fix attempt

This is how production systems are built. You don't pick one pattern—you compose them based on the problem.

When to Choose Workflows vs. Agents

Now the practical question: when do you reach for a workflow versus an autonomous agent?

The Core Tradeoff

Workflows give you control. You know exactly what will happen, in what order, and you can debug each step. But they're brittle—if the input doesn't match your expected patterns, the workflow fails.

Agents give you adaptability. They can handle unexpected situations, recover from errors, and find creative solutions. But they're unpredictable—you can't guarantee what they'll do, and debugging emergent behavior is hard.

Decision Guide

Choose Workflow when...	Choose Agent when...
Task has clear, repeatable steps	Task is open-ended or exploratory
You can enumerate input categories	You can't predict what steps are needed
Predictability matters (finance, healthcare)	Flexibility matters more than consistency
You need to optimize cost and latency	Recovery from errors is critical
Failures should be loud and obvious	You have strong guardrails in place

The Hybrid Reality

There's a reason workflows dominated the early days of LLM applications. In 2022-2023, models struggled with multi-step reasoning and reliable tool use. If you asked an early model to "research a topic, draft an article, fact-check it, then translate it," you'd get chaos. The only way to get reliable results was to break tasks into small, predictable steps—workflows.

But the landscape has shifted. Today's frontier models have dramatically better reasoning and tool-use capabilities. They can maintain coherent plans over many steps, recover from errors, and make sensible decisions about what to do next.

The result? The industry is moving toward hybrids. Workflows handle the predictable parts—input validation, output formatting, quality checks. Agents handle the parts that need flexibility—research, problem-solving, exploration. Humans stay in the loop for high-stakes decisions.

Start with workflows. Inject agent-like autonomy only where the task genuinely requires flexibility—and where you have guardrails to catch failures.

Summary

Workflows trade the glamour of autonomous agents for something more valuable: reliability.

The four primitives:

Sequential — Chain calls, output feeds input
Parallel — Run simultaneously, aggregate results
Routing — Classify and dispatch to specialists
Looping — Generate, evaluate, refine

These compose into sophisticated systems. And while workflows dominated early LLM applications (because models couldn't reason well enough for autonomy), the balance is shifting. As models get smarter, expect to see more hybrids—workflows providing structure and guardrails, agents providing flexibility where it matters.

Next, we'll explore what happens when you need multiple agents working together—and how to coordinate them into agentic systems.