Agent Fundamentals

Before we dive into theory, let's build something.

In the next 10 minutes, you'll have a working AI agent—one that thinks, uses tools, and acts. Then we'll step back and understand exactly what makes it an "agent" rather than just a chatbot.


Build Your First Agent

Imagine you're building a game where outcomes depend on chance—but you want the AI to narrate dramatically based on actual dice rolls. A regular chatbot can't do this: if you ask it to "roll a d20," it will make up a number. It might even be biased toward dramatic results (always rolling 1s and 20s for good storytelling).

You need an agent that can call a real dice function and react to the actual outcome.

Let's build exactly that: an AI Dungeon Master that narrates tabletop RPG adventures. When you attempt a risky action ("I leap across the lava pit"), it:

  1. Decides it needs to determine success or failure
  2. Calls a tool (a d20 dice roll) to get a random outcome
  3. Narrates the result dramatically

Prerequisites

1. Get Your Gemini API Key

We'll use Google's Gemini API because it offers a generous free tier—perfect for experimentation. The principles transfer to OpenAI, Anthropic, or any provider.

  1. Go to Google AI Studio → API Keys
  2. Click Create API Key
  3. Copy it somewhere safe

2. Python Environment

You'll need Python 3.10 or later:

# Create a project directory
mkdir dungeon-master-agent && cd dungeon-master-agent
 
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Google ADK

ADK (Agent Development Kit) is Google's open-source framework for building agents. It handles the agent loop, tool orchestration, and model communication.

pip install google-adk
Framework Agnostic

This tutorial uses ADK for its simplicity, but the patterns apply to any framework: OpenAI Agents SDK, LangGraph, CrewAI, or raw API calls. Once you understand what an agent does, the how is just syntax.

Project Structure

ADK uses a specific folder structure. Create these files:

dungeon-master-agent/
├── dungeon_master/
│   ├── __init__.py
│   ├── agent.py
│   └── .env

Step 1: Set Your API Key

Create the .env file inside the dungeon_master/ folder:

echo 'GOOGLE_API_KEY="YOUR_API_KEY_HERE"' > dungeon_master/.env

Replace YOUR_API_KEY_HERE with your actual key.

Step 2: Create the Agent

Create dungeon_master/__init__.py (can be empty):

# dungeon_master/__init__.py

Now the main file, dungeon_master/agent.py:

# dungeon_master/agent.py
import random
from google.adk.agents.llm_agent import Agent
 
# --- The Tool: Fate's Dice ---
def roll_d20() -> dict:
    """
    Rolls a 20-sided die to determine the outcome of an action.
    Returns the result and its meaning:
    - 1: Critical Failure
    - 2-9: Failure
    - 10-14: Partial Success
    - 15-19: Success
    - 20: Critical Success
    """
    result = random.randint(1, 20)
    if result == 1:
        meaning = "CRITICAL FAILURE"
    elif result <= 9:
        meaning = "Failure"
    elif result <= 14:
        meaning = "Partial Success"
    elif result <= 19:
        meaning = "Success"
    else:
        meaning = "CRITICAL SUCCESS"
    
    return {"roll": result, "meaning": meaning}
 
 
# --- The Agent: The Dungeon Master ---
root_agent = Agent(
    name="dungeon_master",
    model="gemini-2.0-flash",
    description="A dramatic AI Dungeon Master for text-based adventures.",
    instruction="""You are a dramatic and creative Dungeon Master.
 
When a player attempts an action with uncertain outcome, you MUST:
1. Call the roll_d20 tool to determine fate.
2. Narrate the outcome based on the roll:
   - 1 (Critical Failure): Catastrophic, humiliating, possibly dangerous consequences.
   - 2-9 (Failure): The action fails, but not disastrously.
   - 10-14 (Partial Success): Mixed results, complications arise.
   - 15-19 (Success): The action succeeds cleanly.
   - 20 (Critical Success): Heroic, legendary success beyond expectations.
 
Be vivid. Be dramatic. Make the player feel the weight of fate.""",
    tools=[roll_d20],
)

Step 3: Run Your Agent

From your project root (not inside dungeon_master/), run:

adk run dungeon_master

You'll see an interactive prompt. Try it:

> I attempt to leap across the bubbling lava pit.

Example Output:

[Tool Call: roll_d20()]
[Result: {"roll": 3, "meaning": "Failure"}]
 
Dungeon Master: You take a running start, muscles coiled like springs, and 
launch yourself toward the far edge. For a glorious moment, you're airborne—
wind in your hair, destiny in your grasp.
 
Then gravity remembers you exist.
 
Your foot catches the crumbling edge and you tumble backward, barely 
managing to grab a rocky outcropping. You dangle above the molten rock, 
heat searing your legs, arms screaming in protest. You're alive... for now.
 
What do you do?

What Just Happened?

Congratulations—you've built an agent. But what exactly makes this an "agent" rather than just a ChatGPT wrapper?

Let's trace what happened when you typed "I leap across the lava pit":

Three distinct phases:

PhaseWhat HappenedType
ReasoningThe LLM understood "leap across lava pit" requires determining success. It chose to call roll_d20.Probabilistic (LLM decision)
Tool Executionrandom.randint(1, 20) ran and returned 3.Deterministic (code)
GenerationGrounded by "roll = 3 = Failure", the LLM generated a vivid narrative.Probabilistic (LLM creativity)

This interplay—deterministic tools grounding probabilistic generation—is the essence of agent engineering.

Why the Tool Matters

What if we had skipped the tool entirely and just asked the LLM: "Roll a d20 and narrate the result"?

The model would make up a number. Worse, it would be biased—favoring dramatic rolls (1s and 20s) because that makes for better storytelling. Run it 100 times and you'd see far more critical successes and failures than probability allows.

The roll_d20() function ensures actual randomness: code you can trust, grounding an LLM you can't fully predict. This pattern—deterministic tools grounding probabilistic generation—shows up everywhere in agent engineering.


Chatbot vs. Agent

Now you can feel the difference. Consider these two interactions:

Chatbot:

You: Find me flights to Tokyo next month under $800.
Bot: I can't search flights, but I recommend checking Google Flights or Kayak...

Agent:

You: Find me flights to Tokyo next month under $800.
Agent: [Searches flight API] [Compares 47 options] [Filters by price]
Found 3 options. The best is United departing March 15, returning March 22, at $743. Should I book it?

Same prompt. Different capability.

The Dungeon Master you just built is a simple agent—but it demonstrates the core pattern. It decided to roll the dice. It executed the roll. It adapted its response based on the result. A chatbot would have made up the outcome or refused to play.

ChatbotAgent
Responds to a messagePursues a goal across multiple steps
Generates textGenerates text and takes actions
You do the work; it assistsIt does the work; you supervise

A chatbot is a tool you use. An agent is a system that works for you.


The Five Pillars

Look at the code you just wrote. What are the moving parts?

  • A model (gemini-2.0-flash) — the intelligence
  • A tool (roll_d20) — the action
  • An instruction (the system prompt) — the behavior
  • A loop (ADK's Agent handles this) — the orchestration
  • Memory (conversation history) — the context

These aren't arbitrary. Every AI agent—from your Dungeon Master to Google Deep Research—is built from the same five components:

🧠 Model — The Brain

Gemini 2.0 Flash provided the reasoning in your agent. It read your message, understood that "leap across lava" has an uncertain outcome, and decided to call the dice tool. Then it crafted a narrative based on the result.

→ Deep dive: LLM Fundamentals, Prompt Engineering

🔧 Tools — The Hands

The roll_d20() function is a tool—a way for the model to affect something outside of text generation. Tools are how agents interact with the world: search the web, execute code, query databases, call APIs.

Notice how simple it is: just a Python function with a docstring. The model reads the docstring to understand what the tool does and when to use it.

→ Deep dive: Tools & Function Calling, MCP

🔄 Agent Loop — The Orchestrator

ADK handled the loop: receive input → send to model → execute tool call → feed result back → get final response. This think-act-observe cycle is what makes an agent an agent. Without it, you just have a one-shot text generator.

→ Deep dive: Building Agents, Workflows

📝 Memory — The Notebook

Even your simple agent has memory: it remembers the conversation history. Say "I'm still dangling above the lava," and it knows what you're referring to. More sophisticated agents remember user preferences across sessions, track task progress, and learn from past interactions.

→ Deep dive: Memory & Persistence

📚 Retrieval — The Library

This one your Dungeon Master doesn't use—but imagine if it did. You could give it a PDF of D&D spell descriptions, and it could look up "Feather Fall" when you cast it. Retrieval (RAG) lets agents access knowledge beyond their training data: your documents, your databases, real-time information.

→ Deep dive: RAG

The Learning Path

This tutorial is structured around these five pillars. We'll start with the Model, then progressively add Tools, Retrieval, and Memory. By the capstone, you'll combine everything into a production-grade personal assistant.


Key Takeaways

You now have working code and a mental model. Here's how to use them:

  1. When an agent fails, ask: which pillar broke? Was the model reasoning poorly? Did a tool error? Is the context missing something? The five pillars give you a debugging framework.

  2. Start with tools, not prompts. The naive instinct is to perfect your system prompt. But the biggest wins usually come from giving the agent better tools—or better information to work with.

  3. Trust code, guide the model. Anything that must be correct (randomness, calculations, API calls) belongs in a tool. Let the model handle what it's good at: language, reasoning, creativity.


What's Next?

You've built a working agent. But why does the LLM decide to call a tool? What's actually happening when it "reasons"?

In the next chapter, we'll open the hood and explore LLM Fundamentals—how language models work, their capabilities and limitations, and the engineering concepts you need to use them effectively.


Going Further

The Web Interface

ADK includes a browser-based testing UI. From your project root:

adk web --port 8000

Open http://localhost:8000, select dungeon_master in the dropdown, and chat with your agent in a polished interface.

Development Only

The adk web interface is for local testing. We'll cover production deployment in later chapters.


Next: LLM Fundamentals — How language models think, and what that means for agents.