OpenAI's Agents SDK is the company's production framework for building agentic AI applications. Released in March 2025 as a direct successor to the experimental Swarm project, the SDK provides a structured set of primitives—agents, tools, handoffs, guardrails, and tracing—that let developers build multi-agent systems on top of OpenAI's model APIs. By mid-2026, the SDK has reached v0.17 with both Python and TypeScript implementations, voice agent support, sandbox orchestration, and an expanding hosted-tool catalog.
Developers searching for an openai agents sdk review typically fall into two groups. The first group has already experimented with Swarm and wants to know what changed. The second group is evaluating agent frameworks against LangGraph, CrewAI, Google ADK, or the Vercel AI SDK and needs a concrete picture of what the OpenAI option actually provides—and where it stops providing. This review covers both angles with specific technical details rather than surface-level impressions.
What follows is a structured breakdown of the SDK's architecture, capabilities, tooling, pricing model, and honest limitations. If you are building production agent systems and considering the OpenAI ecosystem, this gives you what you need to make that decision.
From Swarm to Agents SDK: The Evolution
OpenAI shipped Swarm in October 2024 as a deliberately minimal, deliberately experimental multi-agent framework. The entire codebase was under 1,000 lines of Python. The README explicitly stated it was not intended for production use. Despite that disclaimer, Swarm accumulated over 20,000 GitHub stars because the core abstractions—agents, function tools, and handoffs—were genuinely well-designed and easy to reason about.
In March 2025, OpenAI replaced Swarm with the Agents SDK (openai-agents-python). The Swarm repository's README now redirects developers to the Agents SDK, stating it is the "production-ready evolution of Swarm." The mental model carries over: agents are instruction-driven entities with access to tools and the ability to hand off to other agents. What changed is everything around that core: guardrails, tracing, hosted tools, sessions, the Responses API integration, voice pipeline support, and an actual release cadence with semantic versioning.
A TypeScript/JavaScript implementation (openai-agents-js) followed in 2025, bringing the same primitives to Node.js and browser environments. By May 2026, the SDK reached v0.17.1 with additions including configurable memory, sandbox-aware orchestration, and standardized MCP (Model Context Protocol) integrations.
Core Primitives: What the SDK Actually Provides
The Agents SDK is built around four core primitives plus an observability layer. Understanding these is essential to evaluating whether the framework fits your architecture.
1. Agent
An Agent is defined by three things: instructions (a system prompt), tools (functions it can call), and handoffs (other agents it can delegate to). Agents also specify which model they use, and different agents within the same application can use different models. A triage agent might run on gpt-4o-mini for speed while a reasoning agent uses o3 for complex analysis.
2. Runner
The Runner manages the agent execution loop. It calls the model, processes tool calls, handles handoffs between agents, and enforces guardrails. There are three execution modes: Runner.run() for complete execution, Runner.run_sync() for synchronous contexts, and Runner.run_streamed() for streaming token-by-token responses. The Runner handles the agentic loop internally—you provide the agent and input, it manages the rest.
3. Guardrails
Guardrails provide input and output validation that runs alongside agent execution. Input guardrails validate user messages before the agent processes them. Output guardrails validate the agent's final response before it reaches the user. You can write custom guardrail functions that use a lightweight LLM call to classify content, check for policy violations, or validate response format. Tool guardrails run on every custom function-tool invocation. Importantly, hosted tools (WebSearchTool, FileSearchTool, CodeInterpreterTool) and handoff calls bypass the tool guardrail pipeline.
4. Handoffs
Handoffs allow an agent to delegate tasks to another specialized agent within the same run. Under the hood, handoffs are exposed to the LLM as tool calls—a handoff to "Refund Agent" appears as a transfer_to_refund_agent tool. When the model invokes this tool, the Runner switches the active agent, carries over conversation context, and continues execution. This is the same pattern Swarm pioneered, now with production guardrails and tracing attached.
5. Tracing
The SDK includes built-in tracing that is enabled by default. Every agent run generates a comprehensive trace record capturing LLM generations, tool calls, handoffs, guardrail checks, and custom events. These traces are viewable in the OpenAI Traces dashboard for debugging and monitoring. For teams that need integration with existing observability stacks, the SDK supports OpenTelemetry export, letting you ship trace data to Datadog, Honeycomb, or any OTLP-compatible backend.
Built-in and Hosted Tools
The SDK distinguishes between hosted tools (executed server-side by OpenAI) and local execution tools (executed in your environment).
Hosted tools run on OpenAI's infrastructure and include:
- WebSearchTool — Real-time web search with result citations
- FileSearchTool — Semantic search over uploaded files using vector stores
- CodeInterpreterTool — Sandboxed Python execution for data analysis and computation
- ImageGenerationTool — DALL-E image generation within agent workflows
- HostedMCPTool — Remote MCP server integration executed by OpenAI's API
Local execution tools run in your environment:
- ComputerTool — Computer-use automation (screenshots, clicks, keyboard input)
- ShellTool — Shell command execution
- ApplyPatchTool — File patching for code modifications
- LocalShellTool — Local shell access for development workflows
- Custom function tools — Any Python or TypeScript function with automatic schema generation via type hints
Capability Table
| Capability | Status | Details |
|---|---|---|
| Multi-agent handoffs | Native | Agents delegate to specialized agents; context carried automatically |
| Input/output guardrails | Native | Custom validation functions, LLM-based classification supported |
| Tracing & observability | Native | Enabled by default; OpenTelemetry export for external backends |
| Web search | Hosted tool | Real-time search with citations; runs on OpenAI infrastructure |
| File search (RAG) | Hosted tool | Vector-store-based semantic search over uploaded documents |
| Code interpreter | Hosted tool | Sandboxed Python execution for computation and data analysis |
| Computer use | Local tool | Screenshot, click, type automation via CUA models |
| Voice agents | Supported | Python VoicePipeline; TypeScript RealtimeAgent/RealtimeSession |
| Streaming | Native | Token-by-token streaming via Runner.run_streamed() |
| Sessions / memory | Configurable | Auto conversation history; configurable memory added in 2026 |
| MCP integration | Supported | Both hosted and local MCP server connections |
| Sandbox orchestration | Supported | Isolated execution environments for agent workloads |
| Python SDK | Stable | openai-agents-python, v0.17+ as of May 2026 |
| TypeScript SDK | Stable | openai-agents-js, same primitive set |
| OpenAI model support | Full | GPT-4o, GPT-4o-mini, o3, o4-mini, gpt-realtime-2 |
| Non-OpenAI models | Limited | Via Model interface adapters; hosted tools require OpenAI models |
| State persistence | Not built-in | No native checkpointing; developers must implement persistence |
| Parallel agent execution | Not built-in | Handoff chains are sequential; parallel requires custom orchestration |
| Graph-based workflows | Not supported | Linear handoff chains only; no DAG or cycle support |
Model Support and Provider Flexibility
The SDK is designed around OpenAI's model APIs, with GPT-4o, GPT-4o-mini, o3, o4-mini, and gpt-realtime-2 as first-class options. Different agents in the same application can use different models, letting you optimize cost and capability per task.
For non-OpenAI models, the SDK provides a Model interface that can be implemented as an adapter. The documentation describes the SDK as supporting "100+ LLMs" through this mechanism. In practice, the adapter approach works for basic agent loops (instructions, tool calls, handoffs), but hosted tools like WebSearchTool and FileSearchTool only work with OpenAI models. If you route an agent through an external model adapter, you lose access to the hosted tool catalog and must provide equivalent functionality through custom function tools.
Voice Agent Support
The SDK integrates with OpenAI's Realtime API for voice agent applications. The implementation differs by language:
- Python: The
VoicePipelineclass wraps an existing text agent and adds speech-to-text and text-to-speech layers. This is the quickest path to voice-enabling an agent you have already built. - TypeScript:
RealtimeAgentandRealtimeSessionprovide a browser-native voice assistant pattern with automatic interruption detection, context management, and the same guardrails and handoff patterns available in text agents.
Voice agents use the gpt-realtime-2 model and support function tools and hosted MCP tools within real-time sessions. This makes the SDK a viable option for building voice-based customer service agents, phone systems, or interactive voice assistants.
Pricing: What It Actually Costs
The Agents SDK itself is free and open-source under the MIT license. There is no licensing fee for the framework. The cost comes entirely from OpenAI API usage:
- Model inference: Standard per-token pricing for whichever models your agents use (GPT-4o, o3, etc.)
- Hosted tools: WebSearchTool, FileSearchTool, and CodeInterpreterTool have their own per-use pricing on top of token costs
- Tracing: The OpenAI Traces dashboard is included with API access at no additional charge
- Voice: Realtime API pricing applies for voice agent sessions (per-minute audio pricing)
For teams already committed to OpenAI's API, the SDK adds no marginal framework cost. For teams using external models through adapters, the SDK is still free, but you lose the hosted tools that represent a significant portion of the SDK's value proposition.
Where the OpenAI Agents SDK Falls Short
No framework is the right choice for every project. Here are five specific scenarios where the OpenAI Agents SDK creates friction or fails to deliver.
1. OpenAI Lock-in Is a Real Constraint
The SDK's most powerful features—hosted tools, tracing dashboard, voice agents, sessions—only work with OpenAI models. The external model adapter path exists, but it strips away hosted tools and reduces the SDK to a basic agent loop. If your architecture requires model portability (switching between Anthropic, Google, or open-source models based on cost or capability), the SDK's adapter story is too thin. Frameworks like LangGraph or the Vercel AI SDK provide first-class multi-provider support without capability degradation.
2. No Built-in State Persistence or Checkpointing
The SDK does not include native state persistence or checkpointing. If your workflow needs to pause for human approval and resume hours later, you are building that infrastructure yourself. If a long-running agent process crashes mid-execution, there is no checkpoint to recover from. LangGraph provides checkpointing natively, letting you save and restore agent state at arbitrary points. For workflows that involve human-in-the-loop approvals, multi-day execution spans, or crash recovery, the SDK's absence of persistence is a significant gap.
3. Linear Handoffs, Not Graph-Based Orchestration
The handoff pattern is elegant for linear delegation chains: triage agent hands off to specialist agent, specialist hands off to escalation agent. But it does not support parallel agent execution, conditional branching, or cyclical workflows natively. If you need to fan out work to three agents simultaneously and merge their results, or define complex DAG-based workflows with conditional routing, you are either building custom orchestration on top of the SDK or choosing the wrong framework. LangGraph and similar tools provide graph-based workflow definitions that handle these patterns natively.
4. Enterprise Features Are Thin
For production deployments at scale, the SDK lacks several features enterprise teams expect: there is no built-in retry logic or fallback routing for failed tool calls. Rate limiting, circuit breakers, and graceful degradation are left to the developer. Agent-to-agent communication across service boundaries (A2A protocol) has limited integration. Audit logging beyond tracing, role-based access control for agent capabilities, and multi-tenant isolation are not addressed. Teams building mission-critical agent systems will find themselves writing substantial infrastructure code around the SDK.
5. The Handoff Pattern Has a Scaling Ceiling
The handoff model works well with a handful of specialized agents. Reports from teams using the SDK indicate the pattern becomes unwieldy beyond eight to ten agent types. Each handoff is exposed as a tool to the LLM, and as the number of possible handoff targets grows, the model's ability to select the correct agent degrades. For systems requiring dozens of specialized agents, a routing layer or hierarchical agent architecture becomes necessary—infrastructure the SDK does not provide.
The Bottom Line
The OpenAI Agents SDK is a well-designed, production-ready framework for building multi-agent applications within the OpenAI ecosystem. Its core primitives—agents, handoffs, guardrails, and tracing—are genuinely well-conceived and minimal enough to learn quickly without sacrificing power. The hosted tool catalog (web search, file search, code interpreter) provides significant out-of-the-box capability. Voice agent support via the Realtime API adds a dimension most competing frameworks lack.
The SDK is the right choice if you are already committed to OpenAI's model APIs, need multi-agent coordination with handoffs, want built-in observability, and your workflows are primarily linear delegation chains with fewer than ten agent types. It is particularly strong for customer service applications, voice-enabled agents, and rapid prototyping of agentic systems.
It is not the right choice if you need model portability across providers, complex graph-based workflows, built-in state persistence, or enterprise infrastructure features like retry policies and circuit breakers. For those requirements, evaluate Claude and the Anthropic ecosystem, LangGraph for stateful graph workflows, or the Cursor IDE for AI-assisted development with multi-model support.
The honest assessment: the OpenAI Agents SDK is the best framework for building agents if and only if you have decided OpenAI models are your foundation. That is a reasonable decision for many teams. Just make it deliberately, because the framework's value proposition is tightly coupled to that choice.
This article may contain affiliate links. If you purchase through these links, we may earn a commission at no additional cost to you. See our full disclosure for details.