The Agent Ecosystem
In early 2024, building an AI agent meant stitching together a dozen incompatible tools. You'd write a function for Claude, rewrite it for GPT, pray your code execution didn't crash your server, and manually wire up every integration. By the end of 2025, something remarkable had happened: the chaos had organized itself into a coherent stack.
Three open protocols now form the backbone of modern agent infrastructure. Frameworks have matured from experimental toys into production-ready platforms. And a new category of infrastructure—sandboxes, model gateways, tool registries—has emerged to solve problems we didn't even know we had.
This chapter maps that ecosystem. Not to tell you what to use (the landscape shifts too fast for prescriptions), but to give you a mental model for understanding how the pieces fit together.
we should also include model selections, cover topics like:
- open source model vs proprietary model
- api from model providers vs cloud-hosted models
- local models vs cloud models, mentions ollama or related tools
- and models for other tasks: for example, translation(gamma-translate, jina reader lm, so on)1. The Protocol Stack
The agent ecosystem has converged on three complementary protocols, each solving a different connectivity problem:
| Protocol | Purpose | Analogy |
|---|---|---|
| MCP (Model Context Protocol) | Agent ↔ Tools & Data | USB-C for peripherals |
| A2A (Agent-to-Agent) | Agent ↔ Agent | HTTP for web services |
| AG-UI (Agent-User Interaction) | Agent ↔ User Interface | WebSocket for real-time apps |
Think of it like the web stack. HTTP handles client-server communication, WebSocket enables real-time updates, and REST/GraphQL provides data access patterns. The agentic stack mirrors this: AG-UI handles the frontend connection, A2A enables service-to-service communication, and MCP provides the data/tool access layer.
Understanding this stack helps you reason about where different technologies fit. A framework like LangGraph operates at the agent runtime level. A sandbox like E2B operates at the tools level. A library like CopilotKit operates at the AG-UI level. They're not competitors—they're layers.
2. Unified Model Access
Before you can build agents, you need to talk to models. And talking to models used to be painful.
Every provider has its own SDK, its own message format, its own quirks. OpenAI uses messages, Anthropic uses messages but with different structure, Google uses contents. Want to switch providers? Rewrite your integration. Want to fall back to a cheaper model when the expensive one is overloaded? Build that routing logic yourself.
Unified model access layers solve this by providing a single interface that works across providers.
AI SDK (Vercel)
The AI SDK is a TypeScript toolkit that standardizes how you interact with LLMs. Write your code once, swap providers by changing a single line:
import { generateText } from "ai";
// Switch providers by changing this one line
const { text } = await generateText({
model: "anthropic/claude-sonnet-4", // or "openai/gpt-4o", "google/gemini-2.0-flash"
prompt: "Explain quantum computing",
});Beyond provider abstraction, the AI SDK provides streaming primitives, tool calling helpers, and UI hooks for React/Vue/Svelte. If you're building in the JavaScript ecosystem, it's become the de facto standard.
OpenRouter
OpenRouter takes a different approach: it's a proxy service rather than a library. You send requests to OpenRouter's API, and it routes them to the appropriate provider. This gives you:
- Unified billing across dozens of providers
- Automatic fallbacks when a model is down
- Rate limit pooling across multiple API keys
- Model discovery for new models as they launch
The tradeoff is latency (an extra network hop) and vendor lock-in to OpenRouter itself. But for teams that want to experiment across many models without managing a dozen API keys, it's compelling.
LiteLLM
LiteLLM is the Python equivalent of AI SDK—an open-source library that provides a unified interface across 100+ LLM providers. It's particularly popular in the data science and ML engineering communities where Python dominates.
When to Use What
If you're building a TypeScript application and want deep framework integration, use the AI SDK. If you want a managed service with routing and fallbacks, use OpenRouter. If you're in Python and want maximum provider coverage, use LiteLLM. If you're using a specific framework like Google ADK, it may have its own model abstraction built in.
AI Gateways
Beyond unified SDKs, a category of AI Gateways has emerged that sits between your application and the model providers. These aren't just routing layers—they provide production infrastructure.
Portkey offers caching (save money on repeated prompts), automatic retries with exponential backoff, fallback chains across providers, request logging, and spend tracking. It's particularly popular with teams that need enterprise-grade reliability.
Helicone focuses on observability—every request is logged with latency, tokens, cost, and custom metadata. It integrates with your existing LLM calls with a one-line proxy change.
Cloudflare AI Gateway provides similar capabilities (caching, rate limiting, analytics) but runs on Cloudflare's edge network, minimizing latency for global applications.
These gateways are complementary to unified SDKs. You might use AI SDK for provider abstraction in your code, and route all requests through Portkey for caching and reliability.
3. Agent Frameworks
You can build agents from scratch with raw API calls. Chapter 10 showed you how. But as your agents grow more complex—multi-step workflows, persistent state, parallel execution, error recovery—the scaffolding code starts to dominate.
Agent frameworks provide that scaffolding. They handle the agent loop, state management, tool execution, and often provide higher-level patterns like workflows and multi-agent orchestration.
The agent framework landscape is volatile. Libraries rise and fall with the hype cycle. The concepts we've taught in this tutorial—the ReAct loop, tool calling, context management—are stable. The frameworks that implement them are not. Evaluate carefully, and don't marry your architecture to any single library.
Google ADK (Agent Development Kit)
Google's ADK is what we've been using throughout this tutorial. It's a production-ready framework available in Python, TypeScript, Go, and Java, optimized for Gemini but compatible with other models.
Strengths: Deep integration with Google Cloud (Vertex AI, BigQuery, Cloud Run), built-in observability, native MCP and A2A support, visual workflow builder.
Best for: Teams already in the Google ecosystem, enterprise deployments, projects that need managed infrastructure.
LangGraph
LangGraph emerged from LangChain as a specialized tool for building stateful, graph-based agent workflows. While LangChain itself became bloated and controversial, LangGraph found a niche.
Strengths: Excellent visualization of complex workflows, strong support for cycles and conditional branching, good for research and experimentation.
Best for: Complex multi-step workflows where you need to visualize the execution graph, teams that want explicit control over state transitions.
CrewAI
CrewAI focuses on multi-agent collaboration through "role-playing." You define agents with specific personas and goals, and they work together on tasks.
Strengths: Intuitive mental model for team-based problems, good for content generation and research tasks, active community.
Best for: Problems that naturally decompose into roles (researcher, writer, editor), teams that want to get started quickly with multi-agent patterns.
Mastra
Mastra is a newer TypeScript-first framework that emphasizes developer experience and type safety. It has strong MCP integration and a focus on "workflows" as a first-class primitive.
Strengths: Excellent TypeScript experience, clean API design, good documentation, native MCP support.
Best for: TypeScript teams that want a modern, well-designed framework without the baggage of earlier generations.
Pydantic AI
From the creators of Pydantic (the validation library that powers FastAPI), Pydantic AI brings the same philosophy of "type safety and validation" to agent development.
Strengths: Excellent structured output handling, strong typing, good for teams that already use Pydantic.
Best for: Python teams that prioritize type safety and validation, applications where structured output is critical.
LlamaIndex
LlamaIndex started as a RAG framework but has expanded into agent territory. Its strength remains in data ingestion and retrieval.
Strengths: Best-in-class RAG capabilities, extensive data connector library, good for knowledge-intensive applications.
Best for: Applications where retrieval is the primary agent capability, teams building "talk to your data" products.
The Meta-Pattern
Despite their differences, all these frameworks implement the same core patterns: the agent loop, tool calling, state management, and orchestration. If you understand the fundamentals, you can move between frameworks. If you only understand the framework, you're stuck when it changes.
4. Low-Code Agent Builders
Not everyone building agents writes code. And even developers sometimes want to prototype visually before committing to implementation.
Low-code agent builders provide drag-and-drop interfaces for constructing agent workflows. They're useful for:
- Rapid prototyping and experimentation
- Non-technical team members who need to build simple automations
- Demos and proof-of-concepts
- Debugging complex flows visually
Flowise
Flowise is an open-source UI for building LangChain flows. You drag nodes (LLMs, tools, memory, chains) onto a canvas and connect them. It's useful for visualizing what your code will do before you write it, and for teams transitioning from no-code to code.
Langflow
Similar to Flowise but with a cleaner interface and better support for exporting to production code. DataStax acquired Langflow and has been investing in enterprise features.
Dify
Dify positions itself as an "LLMOps platform"—beyond just building agents, it handles prompt management, dataset management, and observability. It's more opinionated but provides a more complete workflow.
n8n (with AI nodes)
n8n is a general-purpose workflow automation tool (think Zapier, but self-hostable). Its AI nodes let you add LLM capabilities to existing automation workflows. This is compelling if your "agent" is really just an automation that sometimes needs intelligence.
The Tradeoffs
Low-code tools are great for starting, but they hit walls. Complex branching logic becomes spaghetti. Custom integrations require escape hatches to code. Performance tuning is limited. Most teams use them for prototyping, then reimplement in code for production.
5. Secure Execution: Sandboxes
Here's a rule that should be tattooed on every agent developer's forearm:
Never execute agent-generated code on your production server.
When an agent writes code—whether it's a Python script for data analysis, a shell command for system administration, or JavaScript for a web scraper—that code is untrusted. It might contain bugs. It might contain prompt-injected malicious instructions. It might simply consume all your resources.
Sandboxes provide isolated execution environments where agent-generated code can run safely.
E2B
E2B (short for "Environment to Binary") provides cloud-hosted sandboxes specifically designed for AI agents. Each sandbox is an isolated micro-VM that boots in about 150ms. You can think of it as a disposable computer for each agent task.
The typical workflow:
- Your agent decides it needs to run code
- You spin up an E2B sandbox
- The agent's code executes in isolation
- You retrieve the results (files, stdout, etc.)
- The sandbox is destroyed
E2B provides filesystem access, network capabilities (with controls), and support for multiple languages. It's become the go-to solution for code interpreter tools.
Cloudflare Workers (and Workers AI)
Cloudflare's edge computing platform offers a different model. Rather than VMs, Workers run in V8 isolates—lightweight JavaScript execution contexts that start in milliseconds and have strict resource limits.
For agents that primarily need to run JavaScript or interact with web APIs, Workers provide a more lightweight (and often cheaper) option than full VMs. Cloudflare has also added Workers AI for running inference at the edge, and Durable Objects for stateful agent coordination.
When to Use Each
E2B excels when you need full VM capabilities: arbitrary languages, filesystem access, long-running processes. Cloudflare Workers excels when you need lightweight, fast, JavaScript-focused execution at global scale.
Some teams use both: Workers for quick data transformations and API orchestration, E2B for heavy-duty code execution.
6. Browser and Computer Use
Some agent tasks require interacting with the visual world—clicking buttons, filling forms, navigating websites, or even controlling desktop applications. This is the domain of computer use agents.
Anthropic Computer Use
In late 2024, Anthropic released "computer use" capabilities for Claude—the model can see screenshots and generate mouse/keyboard actions. This opened a new category of agents that can automate any GUI-based workflow.
The typical pattern:
- Capture a screenshot of the current screen
- Send it to Claude with instructions ("Click the Submit button")
- Claude returns coordinates and action type
- Your code executes the action
- Repeat
This is powerful but slow (each action requires an API call with image upload) and expensive. It's best for workflows that can't be automated any other way.
Playwright and Puppeteer MCP Servers
For web automation specifically, browser control libraries like Playwright and Puppeteer are more efficient than computer use. The community has built MCP servers that expose browser automation as tools:
- Navigate to URLs
- Click elements by selector
- Fill forms
- Extract text and screenshots
- Wait for elements to appear
Your agent reasons about what to do, and the MCP server handles how to do it in the browser.
Browser-as-a-Service
Browserbase, Steel, and Browserless provide cloud-hosted browsers specifically designed for automation. They handle the infrastructure headaches: browser versions, proxies, CAPTCHAs, and scaling. Your agent connects to their API and controls a browser without managing any infrastructure.
When to Use What
- Playwright/Puppeteer MCP: Web automation where you can identify elements programmatically
- Computer Use: Desktop apps, complex UIs where selectors don't work, or "human-like" interaction is required
- Browser-as-a-Service: When you need scale, reliability, or features like proxy rotation
7. The MCP Ecosystem
We covered MCP in depth in Chapter 7, so we won't repeat the protocol details here. But the ecosystem around MCP has exploded since then, and it's worth understanding the landscape.
Recap: What MCP Does
MCP (Model Context Protocol) standardizes how AI applications connect to external tools and data sources. Write a tool once as an MCP server, and any MCP-compatible client can use it—Claude Desktop, VS Code Copilot, Cursor, your custom agent, and dozens more.
MCP Registries and Discovery
As the number of MCP servers has grown, discovery has become a challenge. Where do you find an MCP server for your database? How do you know it's trustworthy?
Smithery.ai has emerged as the primary public registry for MCP servers. It catalogs hundreds of community-built servers across categories: databases, APIs, file systems, developer tools, and more. Each listing includes documentation, installation instructions, and often source code links.
mcp.run provides a hosted MCP gateway—you can run MCP servers in the cloud without managing infrastructure, and access them from any client.
Composio takes a different approach, offering pre-built integrations with 100+ SaaS tools (Slack, GitHub, Notion, etc.) exposed via MCP. If you need to connect your agent to common business tools, Composio can save weeks of integration work.
The MCP Server Ecosystem
The community has built MCP servers for almost everything:
- Databases: PostgreSQL, MySQL, SQLite, MongoDB, Supabase
- Developer Tools: GitHub, GitLab, Jira, Linear, Sentry
- Productivity: Google Drive, Notion, Slack, Gmail
- Data Sources: Web scraping, API connectors, file systems
- Infrastructure: AWS, GCP, Kubernetes, Docker
Google ADK, the AI SDK, and most major frameworks now have native MCP support. The protocol has achieved what it set out to do: write once, use everywhere.
MCP vs. Native Tool Calling
Should you use MCP servers or implement tools directly in your agent? The answer depends on reusability. If a tool is specific to your application, implement it directly. If a tool is generic (database access, API integration), use or build an MCP server—you'll be able to reuse it across projects and share it with the community.
OpenAPI-to-Tools Generation
One of the most powerful patterns in the MCP ecosystem is automatic tool generation from OpenAPI specs. If a service has an OpenAPI/Swagger definition (and most modern APIs do), you can generate an MCP server automatically.
Several tools support this:
- Stainless generates type-safe SDKs and MCP servers from OpenAPI specs
- Speakeasy focuses on SDK generation with MCP output support
- ADK's OpenAPI tools can import OpenAPI specs directly as agent tools
This dramatically reduces integration time. Instead of manually coding each API endpoint as a tool, you point at the OpenAPI spec and get dozens of tools instantly.
8. Agent-to-User Interaction (AG-UI)
Traditional chatbots return text. Modern agents need richer interactions: streaming responses, interactive UI components, real-time state synchronization, and human-in-the-loop workflows.
AG-UI (Agent-User Interaction Protocol) standardizes this communication layer. It's an event-based protocol that connects agentic backends to frontend applications, enabling:
- Streaming chat: Real-time token streaming with cancel and resume
- Generative UI: Agents that return structured UI components, not just text
- Shared state: Bidirectional state synchronization between agent and frontend
- Human-in-the-loop: Pause, approve, edit, or escalate mid-execution
The Generative UI Pattern
Consider a travel booking agent. In a traditional text-based interface, it might respond:
"I found 3 flights from NYC to London. Option 1: British Airways, $450, departs 7pm. Option 2: ..."
With generative UI, it returns structured data that your frontend renders as an interactive component—cards with images, prices, and "Book Now" buttons. The user can sort, filter, and select without typing another message.
This pattern is especially powerful for:
- Data visualization (charts, graphs, maps)
- Multi-option selection (products, flights, hotels)
- Form filling (shipping addresses, payment details)
- Interactive workflows (approval flows, document reviews)
AG-UI in Practice
AG-UI was born from CopilotKit, a React library for building AI-powered interfaces. The protocol has since been adopted by major frameworks including LangGraph, CrewAI, Google ADK, and Mastra.
The integration typically works like this:
- Your agent backend emits AG-UI events (text chunks, tool calls, UI components)
- A middleware layer streams these events to the frontend
- Your frontend (using CopilotKit or a custom implementation) renders the appropriate UI
AG-UI complements MCP and A2A: MCP gives agents tools, A2A lets agents talk to each other, and AG-UI brings agents into user-facing applications.
9. Agent-to-Agent Communication (A2A)
In Chapter 12, we explored multi-agent patterns: supervisors coordinating workers, hierarchical organizations, peer-to-peer collaboration. Those patterns assumed all agents lived in the same process.
But what happens when agents need to communicate across network boundaries? When the billing agent is a microservice maintained by a different team? When you want to use a specialized third-party agent for financial analysis?
A2A (Agent-to-Agent Protocol) is the open standard for this inter-agent communication.
When to Use A2A vs. Local Sub-Agents
A2A adds network overhead. Don't use it when a local function call would suffice.
Use A2A when:
- The agent is a separate, independently deployed service
- Different teams or organizations maintain different agents
- Agents are written in different languages
- You need a formal contract between components
Use local sub-agents when:
- Agents are internal code organization within one application
- You need low-latency, high-frequency interactions
- Agents share memory or context directly
The A2A Workflow
The A2A pattern has two sides:
Exposing an agent: You wrap your agent in an A2A server, making it accessible over the network. Other agents can discover your agent (through a registry or direct URL), authenticate, and send requests.
Consuming an agent: You create a client proxy that knows how to communicate with remote A2A agents. From your code's perspective, calling the remote agent feels like calling a local function—the protocol handles serialization, network transport, and error handling.
A2A in the Real World
Imagine an e-commerce platform with specialized agents:
- Product Search Agent (maintained by the catalog team)
- Inventory Agent (maintained by the warehouse team)
- Shipping Agent (maintained by logistics)
- Customer Service Agent (the user-facing orchestrator)
The Customer Service Agent uses A2A to query the others. Each team can deploy, update, and scale their agent independently. The A2A protocol ensures they can communicate even if they're written in different languages or hosted on different infrastructure.
10. Memory-as-a-Service
Chapter 9 covered memory patterns—short-term, long-term, episodic. But implementing robust memory from scratch is surprisingly complex: you need vector storage, retrieval logic, memory consolidation, and garbage collection.
Memory-as-a-Service providers handle this infrastructure so you can focus on your agent's logic.
Mem0
Mem0 (pronounced "memo") provides a managed memory layer for AI agents. You store memories with simple API calls, and Mem0 handles:
- Automatic embedding and vector storage
- Intelligent retrieval based on relevance and recency
- Memory consolidation (merging similar memories)
- User and session scoping
The API is simple: add() memories, search() for relevant ones, and Mem0 figures out the vector operations.
Zep
Zep focuses on conversation memory specifically. It stores chat histories, extracts facts and entities, and provides temporal awareness ("what did the user say last week?"). It's particularly good for chatbots that need to remember across sessions.
Zep also supports knowledge graphs—extracting entities and relationships from conversations and storing them in a queryable graph structure.
When to Build vs. Buy
If memory is a core differentiator for your product, build it yourself. You'll want full control over retrieval logic, consolidation strategies, and storage.
If memory is table stakes (your agent just needs to remember user preferences), use a service. The engineering effort to build robust memory infrastructure is substantial, and services like Mem0 and Zep have solved the hard problems.
11. Agent-as-a-Service
So far, we've discussed building agents. But sometimes the right answer is to use an agent someone else built.
Agent-as-a-Service providers offer specialized agents accessible via API. Instead of building a coding agent from scratch, you call Devin's API. Instead of building a legal research agent, you use Harvey.
The Landscape
- Devin (Cognition): Autonomous software engineering agent. Give it a GitHub issue, get back a pull request.
- Harvey: Legal AI for contract analysis, research, and document review.
- Glean: Enterprise search and knowledge agent that connects to your company's data.
- Perplexity: Research agent with real-time web access and citation.
- MultiOn: Browser automation agent for web tasks.
The Buy vs. Build Decision
Buy (use Agent-as-a-Service) when:
- The domain requires deep specialization you can't match (legal, medical, security)
- Speed to market matters more than customization
- The agent is a supporting feature, not your core product
Build when:
- The agent is your core product or key differentiator
- You need deep integration with proprietary data/systems
- Cost at scale makes API pricing prohibitive
- You need full control over behavior and safety
Hybrid Approaches
Many teams use both. Your custom orchestrator agent might call Perplexity for research, Devin for code changes, and your own specialized agents for domain-specific tasks. A2A makes this composition natural—each service is just another agent your system can invoke.
12. Putting It Together
Let's visualize how all these pieces fit in a production agent system:
Not every system needs every layer. A simple chatbot might skip AG-UI and use basic HTTP. An internal tool might not need A2A. A batch processing system might skip the UI entirely.
The value of understanding the ecosystem is knowing what's available when you need it. When your users start requesting richer interactions, you know AG-UI exists. When your tool library becomes unmaintainable, you know MCP can help. When your monolithic agent needs to become a distributed system, you know A2A is the answer.
The Landscape in Motion
If there's one thing to take away from this chapter, it's that the ecosystem is converging. The protocol stack (MCP, A2A, AG-UI) provides stable interfaces. Frameworks are maturing and interoperating. Infrastructure categories (sandboxes, model gateways, registries) are becoming well-defined.
This is good news for practitioners. It means the skills you're learning are portable. It means the tools you build today have a better chance of working tomorrow. It means the mental models we've established—the agent loop, tool calling, context engineering—are the right foundation.
The specific frameworks will continue to evolve. New providers will emerge. Better sandboxes will appear. But the architecture? That's stabilizing. And that's what makes this the right time to be learning agent engineering.
Next: Capstone Project