What You're Actually Choosing Between
If you're building or evaluating AI agents in 2025, you've almost certainly hit this decision point: do you wire up your tools using the LLM (large language model) provider's native function calling API, or do you adopt MCP (Model Context Protocol) as your integration layer? The two are frequently confused or conflated in documentation, blog posts, and vendor marketing — and the confusion is understandable, because they solve overlapping but genuinely different problems.
Function calling is a runtime mechanism. It's the API-level handshake where a model says "call this function with these arguments," your code executes it, and the result goes back into context. MCP is a connectivity standard — an open protocol, originally published by Anthropic in late 2024, that defines how AI hosts, clients, and servers discover and communicate with tools and data sources in a standardized way. One is a low-level invocation pattern; the other is a higher-level interoperability layer.
This distinction matters because choosing wrong doesn't just create technical debt — it affects latency, tooling ecosystem access, model compatibility, and how much custom glue code you'll be maintaining six months from now. This article breaks down both approaches with architectural specifics, not abstractions.
Function Calling: The Established Baseline
How It Works
Function calling (also called "tool use" in Anthropic's API and some other providers) was formalized in OpenAI's GPT-4 API in 2023 and has since been adopted in some form by virtually every major LLM provider. The mechanics are straightforward:
- You define a set of tools in your API request as a JSON schema — name, description, parameters, and types.
- The model decides, mid-generation, whether to call a tool. If it does, it outputs a structured tool call object instead of (or alongside) text.
- Your application intercepts that object, executes the actual function, and injects the result back into the conversation as a tool message.
- The model resumes generation with the tool result in context.
This loop can repeat multiple times in a single agentic task — and in practice, complex coding agents like Cursor or Devin may make dozens of tool calls per task (file reads, terminal commands, search queries, etc.).
Supported Models and Providers
Function calling support is near-universal among frontier models:
| Provider / Model | Function Calling Support | Parallel Tool Calls | Max Tools per Request |
|---|---|---|---|
| OpenAI GPT-4o | ✅ Native | ✅ Yes | 128 |
| Anthropic Claude 3.5 / Sonnet 4 | ✅ Native (called "tool use") | ✅ Yes | Varies by model |
| Google Gemini 1.5 / 2.0 | ✅ Native | ✅ Yes | ~128 |
| Mistral Large | ✅ Native | Limited | ~64 |
| Meta Llama 3.x (self-hosted) | ⚠️ Via prompt engineering / fine-tunes | Unreliable | Model-dependent |
The key constraint with function calling is that your tool definitions live inside the API request — they consume input tokens. On GPT-4o at $5 per million input tokens (as of mid-2025; verify current pricing at openai.com/pricing), a 50-tool schema can add 3,000–8,000 tokens to every request, which accumulates fast in high-frequency agentic loops.
Where Function Calling Shines
- Tight, predictable tool sets. When you have 5–20 well-defined tools that don't change, function calling gives you maximum control with minimal overhead.
- Custom application stacks. If you're building a purpose-built agent (a customer support bot, a data analysis pipeline), you don't need a standardized discovery layer — you know exactly what tools exist.
- OpenAI API users. The OpenAI ecosystem has not adopted MCP natively as of mid-2025. Function calling is the right primitive here.
- Latency-sensitive applications. Direct in-process function calls have no transport overhead. MCP servers communicate over stdio or SSE, which adds round-trip time.
- Fine-grained result control. You decide exactly how tool results are formatted and injected back into context — no protocol intermediary.
MCP: The Interoperability Layer
What MCP Actually Is
Anthropic published the Model Context Protocol (MCP) specification in November 2024 as an open standard. The core idea: instead of every AI host application (Cursor, Claude Desktop, Windsurf, your custom agent) building bespoke integrations for every tool (GitHub, Postgres, Slack, filesystem, web search), there should be a common protocol that any client can speak to any server.
MCP defines three primary primitives:
- Tools — callable functions that the model can invoke (analogous to function calling)
- Resources — data sources the model can read (files, database rows, API responses)
- Prompts — pre-defined prompt templates the host can surface to users
The architecture has three roles: MCP Hosts (Claude Desktop, Cursor, Windsurf, your app), MCP Clients (the component inside the host that speaks MCP), and MCP Servers (lightweight processes that expose tools and resources). Servers communicate with clients over either stdio (local subprocess) or SSE (remote HTTP).
The Ecosystem in Mid-2025
The MCP server ecosystem has grown rapidly since the November 2024 launch. As of mid-2025, there are several hundred official and community MCP servers covering:
- Filesystem access (official Anthropic server)
- GitHub, GitLab (repository operations, PR management)
- PostgreSQL, SQLite (direct query execution)
- Brave Search, Exa (web search)
- Slack, Linear, Notion (productivity tools)
- Docker, Kubernetes (infra operations)
- Browser automation (Puppeteer-based servers)
Windsurf and Cursor both support MCP servers in their IDE (Integrated Development Environment) agent modes. Claude Desktop has supported MCP since launch. The GitHub Copilot ecosystem has not adopted MCP natively as of this writing. OpenAI's agent infrastructure uses its own tool-calling conventions.
Disclosure: We earn referral commissions from select partners. This doesn't influence our reviews — we recommend based on research, not revenue.
Where MCP Shines
- Shared tooling across multiple projects and hosts. Write one MCP server for your Postgres database, and every MCP-compatible host in your stack can use it — no re-implementation.
- Dynamic tool discovery. MCP clients can query servers at runtime for what tools are available. This enables richer, more adaptive agent behaviors without hardcoding tool schemas into every API call.
- Agentic platforms and IDEs. If you're working inside Cursor, Windsurf, or building on Claude Desktop, MCP is the native extension point.
- Reducing integration maintenance burden. With a well-maintained community MCP server for a tool you use (GitHub, Linear, etc.), you're not responsible for keeping the integration current.
- Multi-agent architectures. MCP's client/server model maps naturally onto multi-agent setups where sub-agents expose capabilities as MCP servers to an orchestrating agent.
Architectural Comparison: Side by Side
| Dimension | Function Calling | MCP |
|---|---|---|
| Abstraction level | Low — direct LLM API primitive | High — protocol layer above the LLM |
| Tool discovery | Static — defined per request | Dynamic — clients query servers at runtime |
| Transport | In-process / direct API call | stdio (local) or SSE/HTTP (remote) |
| Latency overhead | Minimal | Low–moderate (subprocess or network round-trip) |
| Model compatibility | All major providers | Anthropic-native; OpenAI not supported natively |
| Ecosystem / reuse | DIY — you build each integration | Growing library of pre-built servers |
| State management | Stateless (per API call) | Persistent connection; servers can maintain state |
| Security surface | Controlled by your app | Expanded — each MCP server is an attack surface |
| Best for | Custom apps, OpenAI stack, tight tool sets | Agent platforms, IDE agents, multi-tool ecosystems |
| Spec stability | Mature, stable across providers | Evolving — spec revisions expected through 2025 |
The Token Cost Reality
One underappreciated practical difference is how each approach handles token consumption. With function calling, every tool definition is serialized into your prompt on every API request. A moderately complex tool schema — say, a GitHub tool with create_issue, list_prs, merge_branch, and similar operations — might consume 800–1,500 tokens per request just for definitions. Multiply that across a 30-step agentic task and you're adding 24,000–45,000 tokens in schema overhead alone.
MCP doesn't eliminate this — the model still needs to know what tools exist, and MCP clients inject tool definitions into context — but MCP servers can expose tools more selectively based on context, and some MCP host implementations cache or compress tool descriptions more aggressively than raw JSON schema injection. The net effect depends heavily on the host implementation. This is an area where vendor behavior varies significantly; check the changelog of whatever MCP host you're evaluating.
Security Considerations
This is not discussed enough in the agent builder community. Both approaches carry real risks, but they differ in character.
With function calling, your attack surface is your own code — the functions your application exposes to the model. You control exactly what the model can call, with what arguments, and you validate results before injecting them into context. Prompt injection can still trick the model into calling legitimate functions with malicious arguments, but the blast radius is bounded by your function definitions.
With MCP servers, you're running external processes (or connecting to remote servers) that the model has authority to invoke. A malicious or compromised MCP server could exfiltrate data, execute commands, or manipulate the model's context. The MCP specification includes guidance on authorization and authentication, but implementation quality varies widely across community servers. Before connecting a third-party MCP server to an agent with write access to your codebase or database, treat it with the same scrutiny you'd apply to any third-party package with root access to your system.
Anthropic's own official MCP servers (filesystem, GitHub integration, etc.) are open source and auditable. Community servers are not uniformly so.
When This Is NOT the Right Choice
When MCP Is Not the Right Choice
- You're building on OpenAI's API stack. MCP has no native support in OpenAI's infrastructure. You can build adapters, but you're now maintaining a compatibility layer that the MCP spec may break under you as it evolves. Stick with function calling.
- You need sub-100ms tool invocation latency. MCP's stdio and SSE transports add overhead. If you're building a real-time agent where tool calls happen in tight loops (think: a trading system, a high-frequency automation), direct in-process function calls will be faster and more predictable.
- Your tool set is fixed and small (under 10 tools). The overhead of running an MCP server process, managing its lifecycle, and handling its protocol communication is not justified for a static 5-tool integration. Function calling is simpler, easier to debug, and less to maintain.
- You need production-grade stability today. The MCP specification is still evolving as of mid-2025. Breaking changes to the protocol or host implementations have occurred since the November 2024 launch. For mission-critical production systems, function calling's stability across providers is a meaningful advantage.
- You're running fully local / air-gapped agents. While MCP supports stdio transport for local servers, the ecosystem of community servers often assumes external network access. Rolling your own local-only MCP environment adds complexity that often isn't worth it versus direct function integration.
When Function Calling Is Not the Right Choice
- You're extending Cursor, Windsurf, or Claude Desktop. These hosts expose MCP as their extension API. Fighting against that with custom function-calling injection adds unnecessary friction.
- You need your tools to be reusable across multiple agent projects. Rebuilding the same GitHub or database integration as JSON schemas for every new project is real maintenance overhead that MCP's server model eliminates.
- You want tools to maintain session state between invocations. Function calling is stateless at the protocol level — each call is independent. MCP servers can maintain connection state, which matters for things like holding an authenticated database cursor across multiple queries in a single agent session.
Practical Decision Framework
Use this as a starting heuristic, not a rulebook:
- Building a custom app on OpenAI/GPT-4o? → Function calling. Full stop.
- Extending Cursor, Windsurf, or Claude Desktop? → MCP.
- Building a new agent framework from scratch, long-term? → MCP for external integrations (databases, APIs, services), function calling for internal app logic.
- Prototyping quickly? → Function calling. Fewer moving parts, faster iteration.
- Need to share tooling across a team with multiple agent projects? → MCP servers. The up-front cost pays off in reuse.
- Production system, stability required now? → Function calling. Revisit MCP in 6–12 months as the spec matures.
Bottom Line
MCP and function calling are not competitors — they're different layers of the same stack. Function calling is the execution primitive: how a model triggers a tool at the API level. MCP is the connectivity standard: how tools and data sources are discovered, connected, and reused across agent hosts. If you're building on OpenAI, function calling is your only real option today. If you're working inside the Anthropic/Claude ecosystem or using MCP-native IDEs like Cursor or Windsurf, MCP's server ecosystem can save you significant integration work — provided you're willing to accept a protocol that's still actively evolving.
The honest answer for most teams building agent infrastructure in mid-2025: start with function calling to move fast and maintain control, and selectively adopt MCP servers for high-value, reusable integrations (GitHub, databases, file systems) as the ecosystem matures. Don't rewrite working function-calling integrations just to adopt MCP for its own sake. Evaluate both against your actual constraints — model provider, latency budget, team size, and production stability requirements — before committing either direction. Capabilities and pricing across this space shift frequently; verify specifics on official documentation before making architectural decisions.