Structured Output

You're building a personal assistant that converts natural language into todo items. The user says "Remind me to call mom tomorrow at 5pm" and you want this:

{"task": "Call mom", "due_date": "2026-01-09", "due_time": "17:00", "priority": "medium"}

So you write a prompt asking the LLM to return JSON. You test it, and it works beautifully. You ship it. Then you wake up to bug reports. The LLM responded with:

Sure! Here's the todo item you requested:
{"task": "Call mom", "due_date": "2026-01-09", "due_time": "17:00", "priority": "medium"}

Your JSON.parse() chokes on "Sure! Here's the todo item you requested:" and crashes. You add more instructions: "Return ONLY JSON, no explanation." It works 99% of the time. But 1% of users still hit errors. At scale, 1% means hundreds of failures per day.

This is the fundamental challenge: LLMs don't follow instructions—they approximate them. Asking for JSON gets you JSON most of the time. But "most" isn't good enough for production systems.

What if the model couldn't return anything except valid JSON matching your exact schema?

See It Work: Try in Google AI Studio

Before we dive into code, let's see structured output in action. Open Google AI Studio and try this:

Select a model (e.g., gemini-2.5-flash)
In the right panel, find Output format and select JSON
Click Edit schema and paste this:

{
  "type": "object",
  "properties": {
    "task": { "type": "string", "description": "The task to be done" },
    "due_date": { "type": "string", "description": "Date in YYYY-MM-DD format" },
    "due_time": { "type": "string", "description": "Time in HH:MM 24-hour format" },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] }
  },
  "required": ["task", "priority"]
}

Enter a prompt: Create a todo from: "Remind me to call mom tomorrow at 5pm"
Run it.

Notice what you get back: pure JSON. No "Sure!", no markdown, no explanation. Just the data you asked for, matching the exact structure you defined.

Try running it ten times. Try weird inputs. Try inputs in different languages. The output is always valid JSON matching your schema. This isn't the model being cooperative—it's the model being constrained.

How Does This Work?

When you enable structured output, something fundamental changes in how the model generates text.

Normally, an LLM generates tokens one at a time by sampling from a probability distribution. At each step, it looks at all possible next tokens and picks one (weighted by probability). The word "Sure" is a valid token. So is a JSON brace. The model chooses based on what seems most appropriate.

With structured output, we add a constraint. Before the model samples a token, we ask: "Would this token violate the schema?" If yes, that token is removed from consideration. The model literally cannot see it as an option.

If your schema says priority must be one of ["low", "medium", "high"], the model cannot output "priority": "urgent". Those tokens are invisible to it. This is called constrained decoding.

JSON Mode vs. Structured Output

You might encounter "JSON Mode" in older documentation. JSON Mode guarantees valid JSON syntax—the output will parse. But the model might return {"foo": "bar"} when you wanted {"task": "...", "priority": "..."}.

Structured Output guarantees schema compliance: the right fields, the right types, the required properties. Always prefer structured output when available.

Different providers implement this with slightly different APIs:

Provider	API
OpenAI	`response_format` with `json_schema`
Anthropic	`output_format` with `json_schema`
Google Gemini	`responseMimeType` + `responseSchema`

The concept is the same everywhere. Once you understand constrained decoding, you can use any provider.

The Modern Approach: Type-Safe Schemas

Writing raw JSON Schema is tedious and error-prone. Modern agent development uses type-safe schema libraries that give you:

Autocomplete and type checking in your editor
Automatic validation of responses
Documentation as code — the schema is the specification

In Python, we use Pydantic. In TypeScript, we use Zod. The framework converts these to JSON Schema behind the scenes.

Let's build our todo extractor properly.

Python: Pydantic + Google ADK

First, define the structure you want:

from pydantic import BaseModel, Field
from typing import Literal
 
class TodoItem(BaseModel):
    """A structured todo item extracted from natural language."""
    
    task: str = Field(description="Clear, actionable description of the task")
    due_date: str | None = Field(
        default=None, 
        description="Due date in YYYY-MM-DD format, if mentioned"
    )
    due_time: str | None = Field(
        default=None,
        description="Due time in HH:MM 24-hour format, if mentioned"
    )
    priority: Literal["low", "medium", "high"] = Field(
        description="Priority level inferred from urgency words like 'urgent', 'ASAP', 'when you can'"
    )

Notice a few things about this schema:

Field descriptions are prompts. When you write description="Priority level inferred from urgency words...", you're teaching the model how to fill this field. Good descriptions lead to better extraction.

Optional fields use None. If the user says "buy milk" without mentioning a date, we want null—not a hallucinated date. Making fields nullable gives the model an honest escape hatch.

Literal constrains values. The model can only return "low", "medium", or "high". No "urgent", no "MEDIUM", no variations.

Now create an agent that uses this schema:

from google.adk import Agent
 
todo_extractor = Agent(
    model="gemini-2.5-flash",
    name="todo_extractor",
    instruction="""Extract a todo item from the user's natural language input.
    
    - Identify the core task they want to accomplish
    - Look for date/time references (today, tomorrow, next week, specific dates)
    - Infer priority from their language:
      * "urgent", "ASAP", "important" → high
      * "when you can", "eventually", "no rush" → low  
      * Otherwise → medium
    
    If a date is mentioned relatively (like "tomorrow"), resolve it to YYYY-MM-DD format.""",
    output_schema=TodoItem,
)

Run it:

result = todo_extractor.run("Remind me to call mom tomorrow at 5pm - it's important!")
print(result.text)

Output:

{"task": "Call mom", "due_date": "2026-01-09", "due_time": "17:00", "priority": "high"}

The model resolved "tomorrow" to an actual date. It detected "important" and set priority to "high". And the output is guaranteed to match our TodoItem schema—no parsing exceptions, no retry logic needed.

Let's try another input:

result = todo_extractor.run("maybe pick up groceries sometime")
print(result.text)

Output:

{"task": "Pick up groceries", "due_date": null, "due_time": null, "priority": "low"}

"Maybe" and "sometime" signaled low priority. No date was mentioned, so the model correctly returned null instead of inventing one.

TypeScript: Zod + Google ADK

TypeScript developers can use the same pattern with Zod schemas. The Google ADK for TypeScript supports structured output via the outputSchema parameter:

import { LlmAgent } from '@google/adk';
import { z } from 'zod';
 
// Define the schema with Zod
const TodoItemSchema = z.object({
  task: z.string().describe('Clear, actionable description of the task'),
  due_date: z.string().nullable().describe('Due date in YYYY-MM-DD format, if mentioned'),
  due_time: z.string().nullable().describe('Due time in HH:MM 24-hour format, if mentioned'),
  priority: z.enum(['low', 'medium', 'high']).describe(
    'Priority level inferred from urgency words'
  ),
});
 
// Create the agent
const todoExtractor = new LlmAgent({
  model: 'gemini-2.5-flash',
  name: 'todo_extractor',
  instruction: `Extract a todo item from the user's natural language input.
    
    - Identify the core task they want to accomplish
    - Look for date/time references and resolve to YYYY-MM-DD format
    - Infer priority from language: urgent/ASAP → high, no rush → low, otherwise medium`,
  outputSchema: TodoItemSchema,
});

The pattern is identical: define a typed schema, pass it to the agent, get guaranteed structured output. The type system ensures your code correctly handles the response shape.

Designing Schemas That Work

We've seen the mechanics. Now let's explore the craft of schema design—because a well-designed schema produces better results than a sloppy one.

Principle 1: Descriptions Are Part of the Prompt

Compare these two fields:

# Vague
date: str
 
# Precise  
date: str = Field(
    description="Event date in YYYY-MM-DD format. Resolve relative dates like 'tomorrow' or 'next Tuesday' to absolute dates."
)

The second version tells the model exactly what you want and how to handle edge cases. This isn't just documentation—it's instruction that shapes the output.

Principle 2: Make Fields Nullable When Data Might Be Missing

This is crucial for avoiding hallucinations. When you make a field required, the model must provide a value. If the input doesn't contain that information, the model will invent something.

# Dangerous: model will hallucinate an email if none is mentioned
email: str
 
# Safe: model returns null when email isn't in the input
email: str | None = Field(default=None, description="Email address, if explicitly mentioned")

Ask yourself: "Is this information always present in the input?" If not, make it nullable.

Principle 3: Constrain Values with Enums

When you need specific categories, use Literal (Python) or z.enum() (TypeScript):

category: Literal["billing", "technical", "account", "general"]

The model cannot return "Category: Billing" or "BILLING" or "billing-related". Only your exact values are possible. This eliminates an entire class of downstream bugs.

Principle 4: Structure Complex Data with Nesting

Real-world data often has hierarchy. Use nested objects:

class Address(BaseModel):
    street: str
    city: str
    country: str
 
class Person(BaseModel):
    name: str
    address: Address  # Nested!
    previous_addresses: list[Address] = Field(default_factory=list)

The model understands structure. It knows address contains its own fields and that previous_addresses is a list of address objects.

Handling Edge Cases

Structured output is reliable, but not infallible. There are a few edge cases to handle:

Refusals

If the model refuses a request for safety reasons, it won't follow your schema:

if response.stop_reason == "refusal":
    # Don't try to parse as JSON
    logger.warning(f"Model refused: {response.text}")

This is rare for extraction tasks but can happen with certain content.

Truncation

If max_tokens is too low, the output might be cut off mid-JSON. Check for this:

if response.stop_reason == "max_tokens":
    logger.warning("Response truncated - increase max_tokens")

Complex Schemas

Providers have complexity limits (nesting depth, total properties, enum values). If your schema is too complex, simplify it or split into multiple extraction steps.

Streaming Structured Output

TODO: check if we have better example demonstrating the power of streaming structured output

For user-facing applications, there's a UX problem: users have to wait for the entire JSON object before seeing anything. Unlike text streaming where words appear progressively, JSON needs to be complete to be valid.

Or does it?

Modern SDKs support partial streaming—delivering JSON as it's generated, field by field:

The task appears while priority is still generating. The UI feels alive rather than frozen.

Here's how this looks with the Vercel AI SDK using our todo schema:

import { useObject } from 'ai/react';
import { z } from 'zod';
 
const TodoItemSchema = z.object({
  task: z.string(),
  due_date: z.string().nullable(),
  due_time: z.string().nullable(),
  priority: z.enum(['low', 'medium', 'high']),
});
 
function TodoExtractor() {
  const { object, submit, isLoading } = useObject({
    api: '/api/extract-todo',
    schema: TodoItemSchema,
  });
 
  return (
    <div>
      <input 
        type="text" 
        placeholder="Remind me to..."
        onKeyDown={(e) => {
          if (e.key === 'Enter') submit({ input: e.currentTarget.value });
        }}
      />
      
      {object && (
        <div className="todo-card">
          <h3>{object.task ?? '...'}</h3>
          <span className="priority">{object.priority ?? '...'}</span>
          {object.due_date && <span>{object.due_date} {object.due_time}</span>}
        </div>
      )}
    </div>
  );
}

The object updates progressively. object.task becomes available before object.priority. Your UI updates in real-time without any special handling—just render whatever fields exist.

Exercises

Exercise 1: Receipt Parser

Build an agent that extracts structured data from receipt text:

class LineItem(BaseModel):
    name: str
    quantity: int
    unit_price: float
 
class Receipt(BaseModel):
    store_name: str
    date: str
    items: list[LineItem]
    subtotal: float
    tax: float
    total: float
    payment_method: Literal["cash", "credit", "debit", "mobile"]

Test with various receipt formats. What happens when information is ambiguous or missing?

Exercise 2: Email Classifier

Create an agent that classifies incoming emails:

class EmailClassification(BaseModel):
    category: Literal["inquiry", "complaint", "feedback", "spam", "urgent"]
    sentiment: Literal["positive", "neutral", "negative"]
    requires_response: bool
    summary: str = Field(description="One-sentence summary of the email")
    suggested_priority: Literal["low", "medium", "high"]

Exercise 3: Graceful Failure

What if the input can't be processed? Modify your todo extractor to handle this:

class ExtractionResult(BaseModel):
    success: bool
    todo: TodoItem | None = None
    error_reason: str | None = Field(
        default=None,
        description="If success=False, explain why extraction wasn't possible"
    )

Test with inputs like "The weather is nice today" that don't contain todo-like content.

Key Takeaways

Structured output solves one of the fundamental challenges in agent engineering: bridging probabilistic generation with deterministic systems. By constraining what the model can output at the token level, we eliminate parsing errors and validation failures entirely.

The schema you define isn't just a type specification—it's a communication channel. Good field descriptions teach the model how to fill your structure. Nullable fields prevent hallucination. Enums constrain outputs to exactly what your code expects.

With reliable structured output, your agent can confidently interact with databases, APIs, and user interfaces. The schema is your contract between the AI and your software.

References & Further Reading

You can also refer to the official documentation from major providers:

TODO: add a summary ending this whole part from intro to structured output with the following points:
Previously, we mentioned that Agent is composed of several key components: Model, Tools, Memory. From chapters 1 to 5, we have covered the first component - Model, in depth. Most importantly is that we learned to how to build in model part of the agent: how to guide the model to behave in a desired way. The model will generate outputs that are consumed by other parts of the agent system.