OpenAI Agents SDK Review (2026): Primitives, Tracing, Handoffs, and Where It Falls Short

OpenAI Agents SDK Review (2026): Primitives, Tracing, Handoffs, and Where It Falls Short
This site contains affiliate links. We may earn a commission at no extra cost to you. How we review →

OpenAI's Agents SDK is the company's production framework for building agentic AI applications. Released in March 2025 as a direct successor to the experimental Swarm project, the SDK provides a structured set of primitives—agents, tools, handoffs, guardrails, and tracing—that let developers build multi-agent systems on top of OpenAI's model APIs. By mid-2026, the SDK has reached v0.17 with both Python and TypeScript implementations, voice agent support, sandbox orchestration, and an expanding hosted-tool catalog.

Developers searching for an openai agents sdk review typically fall into two groups. The first group has already experimented with Swarm and wants to know what changed. The second group is evaluating agent frameworks against LangGraph, CrewAI, Google ADK, or the Vercel AI SDK and needs a concrete picture of what the OpenAI option actually provides—and where it stops providing. This review covers both angles with specific technical details rather than surface-level impressions.

What follows is a structured breakdown of the SDK's architecture, capabilities, tooling, pricing model, and honest limitations. If you are building production agent systems and considering the OpenAI ecosystem, this gives you what you need to make that decision.

From Swarm to Agents SDK: The Evolution

OpenAI shipped Swarm in October 2024 as a deliberately minimal, deliberately experimental multi-agent framework. The entire codebase was under 1,000 lines of Python. The README explicitly stated it was not intended for production use. Despite that disclaimer, Swarm accumulated over 20,000 GitHub stars because the core abstractions—agents, function tools, and handoffs—were genuinely well-designed and easy to reason about.

In March 2025, OpenAI replaced Swarm with the Agents SDK (openai-agents-python). The Swarm repository's README now redirects developers to the Agents SDK, stating it is the "production-ready evolution of Swarm." The mental model carries over: agents are instruction-driven entities with access to tools and the ability to hand off to other agents. What changed is everything around that core: guardrails, tracing, hosted tools, sessions, the Responses API integration, voice pipeline support, and an actual release cadence with semantic versioning.

A TypeScript/JavaScript implementation (openai-agents-js) followed in 2025, bringing the same primitives to Node.js and browser environments. By May 2026, the SDK reached v0.17.1 with additions including configurable memory, sandbox-aware orchestration, and standardized MCP (Model Context Protocol) integrations.

Core Primitives: What the SDK Actually Provides

The Agents SDK is built around four core primitives plus an observability layer. Understanding these is essential to evaluating whether the framework fits your architecture.

1. Agent

An Agent is defined by three things: instructions (a system prompt), tools (functions it can call), and handoffs (other agents it can delegate to). Agents also specify which model they use, and different agents within the same application can use different models. A triage agent might run on gpt-4o-mini for speed while a reasoning agent uses o3 for complex analysis.

2. Runner

The Runner manages the agent execution loop. It calls the model, processes tool calls, handles handoffs between agents, and enforces guardrails. There are three execution modes: Runner.run() for complete execution, Runner.run_sync() for synchronous contexts, and Runner.run_streamed() for streaming token-by-token responses. The Runner handles the agentic loop internally—you provide the agent and input, it manages the rest.

3. Guardrails

Guardrails provide input and output validation that runs alongside agent execution. Input guardrails validate user messages before the agent processes them. Output guardrails validate the agent's final response before it reaches the user. You can write custom guardrail functions that use a lightweight LLM call to classify content, check for policy violations, or validate response format. Tool guardrails run on every custom function-tool invocation. Importantly, hosted tools (WebSearchTool, FileSearchTool, CodeInterpreterTool) and handoff calls bypass the tool guardrail pipeline.

4. Handoffs

Handoffs allow an agent to delegate tasks to another specialized agent within the same run. Under the hood, handoffs are exposed to the LLM as tool calls—a handoff to "Refund Agent" appears as a transfer_to_refund_agent tool. When the model invokes this tool, the Runner switches the active agent, carries over conversation context, and continues execution. This is the same pattern Swarm pioneered, now with production guardrails and tracing attached.

5. Tracing

The SDK includes built-in tracing that is enabled by default. Every agent run generates a comprehensive trace record capturing LLM generations, tool calls, handoffs, guardrail checks, and custom events. These traces are viewable in the OpenAI Traces dashboard for debugging and monitoring. For teams that need integration with existing observability stacks, the SDK supports OpenTelemetry export, letting you ship trace data to Datadog, Honeycomb, or any OTLP-compatible backend.

Built-in and Hosted Tools

The SDK distinguishes between hosted tools (executed server-side by OpenAI) and local execution tools (executed in your environment).

Hosted tools run on OpenAI's infrastructure and include:

  • WebSearchTool — Real-time web search with result citations
  • FileSearchTool — Semantic search over uploaded files using vector stores
  • CodeInterpreterTool — Sandboxed Python execution for data analysis and computation
  • ImageGenerationTool — DALL-E image generation within agent workflows
  • HostedMCPTool — Remote MCP server integration executed by OpenAI's API

Local execution tools run in your environment:

  • ComputerTool — Computer-use automation (screenshots, clicks, keyboard input)
  • ShellTool — Shell command execution
  • ApplyPatchTool — File patching for code modifications
  • LocalShellTool — Local shell access for development workflows
  • Custom function tools — Any Python or TypeScript function with automatic schema generation via type hints

Capability Table

Capability Status Details
Multi-agent handoffsNativeAgents delegate to specialized agents; context carried automatically
Input/output guardrailsNativeCustom validation functions, LLM-based classification supported
Tracing & observabilityNativeEnabled by default; OpenTelemetry export for external backends
Web searchHosted toolReal-time search with citations; runs on OpenAI infrastructure
File search (RAG)Hosted toolVector-store-based semantic search over uploaded documents
Code interpreterHosted toolSandboxed Python execution for computation and data analysis
Computer useLocal toolScreenshot, click, type automation via CUA models
Voice agentsSupportedPython VoicePipeline; TypeScript RealtimeAgent/RealtimeSession
StreamingNativeToken-by-token streaming via Runner.run_streamed()
Sessions / memoryConfigurableAuto conversation history; configurable memory added in 2026
MCP integrationSupportedBoth hosted and local MCP server connections
Sandbox orchestrationSupportedIsolated execution environments for agent workloads
Python SDKStableopenai-agents-python, v0.17+ as of May 2026
TypeScript SDKStableopenai-agents-js, same primitive set
OpenAI model supportFullGPT-4o, GPT-4o-mini, o3, o4-mini, gpt-realtime-2
Non-OpenAI modelsLimitedVia Model interface adapters; hosted tools require OpenAI models
State persistenceNot built-inNo native checkpointing; developers must implement persistence
Parallel agent executionNot built-inHandoff chains are sequential; parallel requires custom orchestration
Graph-based workflowsNot supportedLinear handoff chains only; no DAG or cycle support

Model Support and Provider Flexibility

The SDK is designed around OpenAI's model APIs, with GPT-4o, GPT-4o-mini, o3, o4-mini, and gpt-realtime-2 as first-class options. Different agents in the same application can use different models, letting you optimize cost and capability per task.

For non-OpenAI models, the SDK provides a Model interface that can be implemented as an adapter. The documentation describes the SDK as supporting "100+ LLMs" through this mechanism. In practice, the adapter approach works for basic agent loops (instructions, tool calls, handoffs), but hosted tools like WebSearchTool and FileSearchTool only work with OpenAI models. If you route an agent through an external model adapter, you lose access to the hosted tool catalog and must provide equivalent functionality through custom function tools.

Voice Agent Support

The SDK integrates with OpenAI's Realtime API for voice agent applications. The implementation differs by language:

  • Python: The VoicePipeline class wraps an existing text agent and adds speech-to-text and text-to-speech layers. This is the quickest path to voice-enabling an agent you have already built.
  • TypeScript: RealtimeAgent and RealtimeSession provide a browser-native voice assistant pattern with automatic interruption detection, context management, and the same guardrails and handoff patterns available in text agents.

Voice agents use the gpt-realtime-2 model and support function tools and hosted MCP tools within real-time sessions. This makes the SDK a viable option for building voice-based customer service agents, phone systems, or interactive voice assistants.

Pricing: What It Actually Costs

The Agents SDK itself is free and open-source under the MIT license. There is no licensing fee for the framework. The cost comes entirely from OpenAI API usage:

  • Model inference: Standard per-token pricing for whichever models your agents use (GPT-4o, o3, etc.)
  • Hosted tools: WebSearchTool, FileSearchTool, and CodeInterpreterTool have their own per-use pricing on top of token costs
  • Tracing: The OpenAI Traces dashboard is included with API access at no additional charge
  • Voice: Realtime API pricing applies for voice agent sessions (per-minute audio pricing)

For teams already committed to OpenAI's API, the SDK adds no marginal framework cost. For teams using external models through adapters, the SDK is still free, but you lose the hosted tools that represent a significant portion of the SDK's value proposition.

Where the OpenAI Agents SDK Falls Short

No framework is the right choice for every project. Here are five specific scenarios where the OpenAI Agents SDK creates friction or fails to deliver.

1. OpenAI Lock-in Is a Real Constraint

The SDK's most powerful features—hosted tools, tracing dashboard, voice agents, sessions—only work with OpenAI models. The external model adapter path exists, but it strips away hosted tools and reduces the SDK to a basic agent loop. If your architecture requires model portability (switching between Anthropic, Google, or open-source models based on cost or capability), the SDK's adapter story is too thin. Frameworks like LangGraph or the Vercel AI SDK provide first-class multi-provider support without capability degradation.

2. No Built-in State Persistence or Checkpointing

The SDK does not include native state persistence or checkpointing. If your workflow needs to pause for human approval and resume hours later, you are building that infrastructure yourself. If a long-running agent process crashes mid-execution, there is no checkpoint to recover from. LangGraph provides checkpointing natively, letting you save and restore agent state at arbitrary points. For workflows that involve human-in-the-loop approvals, multi-day execution spans, or crash recovery, the SDK's absence of persistence is a significant gap.

3. Linear Handoffs, Not Graph-Based Orchestration

The handoff pattern is elegant for linear delegation chains: triage agent hands off to specialist agent, specialist hands off to escalation agent. But it does not support parallel agent execution, conditional branching, or cyclical workflows natively. If you need to fan out work to three agents simultaneously and merge their results, or define complex DAG-based workflows with conditional routing, you are either building custom orchestration on top of the SDK or choosing the wrong framework. LangGraph and similar tools provide graph-based workflow definitions that handle these patterns natively.

4. Enterprise Features Are Thin

For production deployments at scale, the SDK lacks several features enterprise teams expect: there is no built-in retry logic or fallback routing for failed tool calls. Rate limiting, circuit breakers, and graceful degradation are left to the developer. Agent-to-agent communication across service boundaries (A2A protocol) has limited integration. Audit logging beyond tracing, role-based access control for agent capabilities, and multi-tenant isolation are not addressed. Teams building mission-critical agent systems will find themselves writing substantial infrastructure code around the SDK.

5. The Handoff Pattern Has a Scaling Ceiling

The handoff model works well with a handful of specialized agents. Reports from teams using the SDK indicate the pattern becomes unwieldy beyond eight to ten agent types. Each handoff is exposed as a tool to the LLM, and as the number of possible handoff targets grows, the model's ability to select the correct agent degrades. For systems requiring dozens of specialized agents, a routing layer or hierarchical agent architecture becomes necessary—infrastructure the SDK does not provide.

The Bottom Line

The OpenAI Agents SDK is a well-designed, production-ready framework for building multi-agent applications within the OpenAI ecosystem. Its core primitives—agents, handoffs, guardrails, and tracing—are genuinely well-conceived and minimal enough to learn quickly without sacrificing power. The hosted tool catalog (web search, file search, code interpreter) provides significant out-of-the-box capability. Voice agent support via the Realtime API adds a dimension most competing frameworks lack.

The SDK is the right choice if you are already committed to OpenAI's model APIs, need multi-agent coordination with handoffs, want built-in observability, and your workflows are primarily linear delegation chains with fewer than ten agent types. It is particularly strong for customer service applications, voice-enabled agents, and rapid prototyping of agentic systems.

It is not the right choice if you need model portability across providers, complex graph-based workflows, built-in state persistence, or enterprise infrastructure features like retry policies and circuit breakers. For those requirements, evaluate Claude and the Anthropic ecosystem, LangGraph for stateful graph workflows, or the Cursor IDE for AI-assisted development with multi-model support.

The honest assessment: the OpenAI Agents SDK is the best framework for building agents if and only if you have decided OpenAI models are your foundation. That is a reasonable decision for many teams. Just make it deliberately, because the framework's value proposition is tightly coupled to that choice.

This article may contain affiliate links. If you purchase through these links, we may earn a commission at no additional cost to you. See our full disclosure for details.

FAQ

Is the OpenAI Agents SDK free to use?
The SDK itself is free and open-source under the MIT license. Costs come from OpenAI API usage including per-token model inference, hosted tool fees for web search, file search, and code interpreter, and per-minute pricing for voice agent sessions via the Realtime API.
What is the difference between OpenAI Swarm and the Agents SDK?
Swarm was an experimental framework released in October 2024 with under 1,000 lines of code and no production support. The Agents SDK, released in March 2025, is the production-ready successor that adds guardrails, tracing, hosted tools, sessions, voice agent support, and an active release cadence. Swarm is officially deprecated and its repository redirects to the Agents SDK.
Can the OpenAI Agents SDK use non-OpenAI models like Claude or Gemini?
The SDK provides a Model interface for connecting external models through adapters, and documentation references support for 100+ LLMs. However, hosted tools like WebSearchTool, FileSearchTool, and CodeInterpreterTool only work with OpenAI models. Using external model adapters reduces the SDK to a basic agent loop without hosted tool access.
Does the OpenAI Agents SDK support voice agents?
Yes. The Python SDK provides VoicePipeline for wrapping existing text agents with speech-to-text and text-to-speech layers. The TypeScript SDK offers RealtimeAgent and RealtimeSession for browser-based voice assistants with interruption detection and context management. Voice agents use the gpt-realtime-2 model.
How does the OpenAI Agents SDK compare to LangGraph?
The Agents SDK excels at linear handoff chains, rapid prototyping, and tight OpenAI integration with built-in tools and tracing. LangGraph is stronger for complex graph-based workflows, state persistence with checkpointing, multi-provider model support, and enterprise features like crash recovery. Choose based on whether your workflows are primarily linear delegation or complex stateful graphs.
Does the OpenAI Agents SDK include built-in tracing and observability?
Yes. Tracing is enabled by default and captures LLM generations, tool calls, handoffs, guardrail checks, and custom events. Traces are viewable in the OpenAI Traces dashboard. For integration with external observability platforms, the SDK supports OpenTelemetry export to backends like Datadog and Honeycomb.

Related reads

Across the Wild Run AI network