Voice AI pricing is deliberately confusing. One vendor quotes per-minute rates. Another charges per character. A third invented its own credit system. A fourth bills per second but only while the call is connected—not while it rings. Stack a speech-to-text provider on top of an LLM on top of a text-to-speech engine on top of a telephony carrier, and the real cost of a single phone call becomes genuinely difficult to calculate before you have built and tested the entire pipeline.
This guide exists because founders, developers, and operations leads searching for a voice ai pricing comparison need a single reference that puts every major platform side by side with consistent units, honest cost projections, and clear warnings about where bills inflate beyond the sticker price. We are not benchmarking voice quality or latency here. This is strictly about money: what each platform charges, how the billing model works, where overages and hidden fees appear, and which option makes financial sense at your call volume.
Pricing data is sourced from official pricing pages, published documentation, and developer community reports current as of May 2026. All prices are in USD. Where platforms offer annual discounts, we note both monthly and annual rates. Where pricing has changed recently, we flag it.
Platform-by-Platform Pricing Breakdown
Below is a detailed breakdown of every major voice AI platform. After the individual sections, we consolidate everything into a master comparison table and run total-cost-of-ownership projections at multiple volume tiers.
ElevenLabs
ElevenLabs is the dominant text-to-speech and voice cloning platform. It offers both standalone TTS/STT APIs and a full Conversational AI product for building voice agents. The pricing structure uses a credit system for TTS and a per-minute rate for conversational AI.
TTS and Voice API Plans
- Free: 10,000 characters per month. 3 custom voices. Non-commercial use only. Access to standard voice models.
- Starter ($5/month): 30,000 characters per month. 10 custom voices. Commercial license included. API access.
- Creator ($22/month): 100,000 characters per month. 30 custom voices. Professional Voice Cloning. Priority queue.
- Pro ($99/month): 500,000 characters per month. 160 custom voices. Higher-quality voice cloning with fewer training samples. 44.1 kHz audio output.
- Scale ($330/month): 2,000,000 characters per month. 660 custom voices. Enterprise-grade uptime SLA. Volume pricing on overages.
- Business (Custom): Negotiated volume. Custom model training. Dedicated infrastructure. Priority support and SLA guarantees.
Conversational AI Pricing
ElevenLabs Conversational AI is priced separately from the TTS API. The platform bundles STT, LLM orchestration, and TTS into a single per-minute rate.
- Per-minute cost: $0.08–$0.12/min depending on plan tier and voice model selection.
- Included minutes: Higher-tier plans include a monthly allocation of conversational minutes. Overages billed at the per-minute rate.
- LLM costs: If you use ElevenLabs’ hosted LLM, the cost is bundled. If you bring your own LLM via API, you pay that provider separately.
Billing model: Monthly subscription with character or minute allocations. Overages billed per unit. Annual billing saves roughly 20%. The credit system can be unintuitive—different voice models consume characters at different rates, with Turbo v3 using approximately 1:1 and older models using higher multipliers.
Vapi
Vapi is a voice agent orchestration platform. It does not provide its own TTS, STT, or LLM—instead, it connects to providers like ElevenLabs, Deepgram, OpenAI, and Anthropic, and charges a platform fee on top of the underlying provider costs.
- Platform fee: $0.05/min for all calls processed through Vapi.
- Provider costs (passed through): You pay each underlying provider at their standard rates. A typical stack: Deepgram STT ($0.0043/min), Claude or GPT-4o for LLM ($0.01–$0.03/min of processed tokens), ElevenLabs TTS ($0.02–$0.04/min).
- Total typical cost: $0.08–$0.15/min depending on your provider choices and conversation complexity.
- Telephony: Vapi includes Twilio integration. Twilio charges apply separately—approximately $0.013/min for US calls plus $1/month per phone number.
- Enterprise: Custom platform fee rates available at volume. Dedicated infrastructure options.
Billing model: Pay-as-you-go with no minimum commitment. The platform fee is straightforward, but total cost depends entirely on which providers you select. This flexibility is both the advantage and the complexity—you can optimize each component, but you need to track four or five separate bills.
Retell AI
Retell AI provides a voice agent platform focused on customer service and sales automation. It bundles STT, LLM orchestration, and TTS but allows provider selection within its platform.
- Pay-as-you-go: No monthly subscription. Billed per minute of call time.
- Per-minute cost: Approximately $0.07–$0.15/min depending on the STT, LLM, and TTS providers selected within the platform.
- Included telephony: Retell includes telephony (Twilio-based) in certain plans, but phone numbers and carrier minutes are billed separately.
- Enterprise plans: Volume discounts, dedicated support, custom SLAs, and SSO available at higher tiers.
- Free trial: Limited free minutes for testing and development.
Billing model: Usage-based with transparent per-minute breakdown. Retell shows the cost contribution of each provider in your stack, making it easier to optimize. The platform fee component is embedded in the per-minute rate rather than itemized separately.
Bland AI
Bland AI focuses on outbound and inbound phone call automation. It positions itself as a turnkey solution for businesses that want voice agents handling calls without managing individual provider relationships.
- Connected call rate: $0.09/min. You are only billed for connected time—ringing, voicemail detection, and failed connections are not charged.
- Included in rate: STT, LLM processing, TTS, and basic telephony are bundled into the per-minute price.
- Phone numbers: Included in the platform—no separate Twilio account needed for basic usage.
- Enterprise pricing: Custom rates for high-volume deployments (50,000+ minutes/month). Dedicated numbers, custom voice training, priority routing.
- API access: Full API for programmatic call management, scheduling, and webhook integrations.
Billing model: Simple per-minute on connected calls. The bundled approach makes cost projection straightforward—multiply your expected connected minutes by $0.09. The trade-off is less provider flexibility; you use Bland’s selected stack rather than choosing your own STT/TTS/LLM combination.
Play.ai
Play.ai offers text-to-speech and voice agent capabilities with a focus on realistic voice cloning and multi-language support.
- Free tier: Limited monthly character allocation for TTS. Access to standard voices.
- Pro plans: Tiered monthly subscriptions starting around $20–$30/month with increasing character allocations and access to premium voices.
- Voice agent platform: Per-minute pricing for conversational AI, competitive with ElevenLabs at similar volume tiers.
- API access: Available on paid plans. Per-character billing for TTS API usage beyond plan allocations.
- Enterprise: Custom pricing with volume discounts, dedicated support, and custom voice model training.
Billing model: Subscription plus overages. The free tier is useful for prototyping but insufficient for production workloads. Play.ai is a newer entrant, and pricing has been adjusted multiple times—verify current rates on their pricing page before committing.
Amazon Connect + Amazon Lex
Amazon Connect paired with Amazon Lex represents the hyperscaler approach to voice AI. It is designed for enterprise contact centers and scales to millions of minutes, but the pricing structure reflects AWS’s component-level billing philosophy.
- Amazon Connect telephony: $0.002/second ($0.12/min) for inbound calls. Outbound rates vary by destination. No per-seat or per-agent licensing.
- Amazon Lex: $0.004 per speech request (one request per user utterance). A typical 5-minute call with 15–20 user turns costs $0.06–$0.08 in Lex charges alone.
- Amazon Polly (TTS): $4.00 per 1 million characters for standard voices, $16.00 per 1 million characters for neural voices.
- Amazon Bedrock (LLM, if used): Per-token pricing varies by model. Claude via Bedrock adds $0.003–$0.015 per 1K input tokens depending on model.
- Total estimated cost: $0.15–$0.25/min for a fully functional voice agent depending on conversation complexity and model selection.
Billing model: Pure pay-as-you-go with no minimums. Every component is billed separately. The advantage is granular cost control and virtually unlimited scale. The disadvantage is operational complexity—you are managing and paying for five or more AWS services independently, and your monthly bill requires a spreadsheet to predict accurately.
Google Contact Center AI (CCAI)
Google CCAI is Google’s enterprise contact center platform. It integrates Dialogflow CX for conversational AI, Google Cloud STT/TTS, and optional Gemini models for generative responses.
- Dialogflow CX: $20 per 100 sessions (a session is a single conversation). Additional charges for audio input/output.
- Google Cloud STT: $0.006–$0.009 per 15 seconds of audio, depending on model selection and features (speaker diarization, punctuation, etc.).
- Google Cloud TTS: $4.00 per 1 million characters for standard, $16.00 per 1 million characters for WaveNet/Neural2 voices.
- Agent Assist / Generative AI features: Additional per-session and per-token charges for Gemini-powered suggestions and responses.
- Enterprise pricing: Custom agreements for large deployments. Google typically negotiates committed-use discounts.
Billing model: Session-based for Dialogflow CX, usage-based for STT/TTS. Like AWS, the component-level billing provides granular control but requires careful cost modeling. CCAI is primarily sold to enterprises through Google Cloud sales teams, and public pricing may not reflect negotiated rates.
Master Pricing Comparison Table
The following table normalizes pricing across all platforms into comparable per-minute rates. Where platforms use different billing units, we have converted to per-minute equivalents based on typical conversation patterns (average call length: 4–6 minutes, 15–20 conversational turns per call).
| Platform | Base Rate (per min) | Typical All-In Cost (per min) | Billing Model | Telephony Included | Best For |
|---|---|---|---|---|---|
| ElevenLabs Conversational AI | $0.08–$0.12 | $0.10–$0.16 | Per-minute + subscription | No (BYO Twilio) | Premium voice quality |
| Vapi | $0.05 platform fee | $0.08–$0.15 | Platform fee + provider pass-through | No (Twilio integration) | Maximum provider flexibility |
| Retell AI | $0.07–$0.10 | $0.07–$0.15 | Per-minute, usage-based | Partial | Transparent cost breakdown |
| Bland AI | $0.09 | $0.09–$0.12 | Per connected minute, bundled | Yes | Simple, predictable billing |
| Play.ai | $0.08–$0.12 | $0.10–$0.18 | Subscription + overages | No | Voice cloning, multilingual |
| Amazon Connect + Lex | $0.12 telephony + Lex | $0.15–$0.25 | Component-level pay-as-you-go | Yes (Connect) | Enterprise scale, AWS ecosystem |
| Google CCAI | $0.20/session + STT/TTS | $0.18–$0.30 | Session + component usage | Via carrier integration | Enterprise, Google Cloud shops |
Total Cost of Ownership by Volume
Sticker rates only tell part of the story. The table below projects total monthly spend at four volume tiers, including platform fees, typical provider costs, telephony, and phone number charges. Estimates assume US domestic calls with average duration of 5 minutes.
| Monthly Minutes | ElevenLabs | Vapi | Retell AI | Bland AI | Amazon Connect | Google CCAI |
|---|---|---|---|---|---|---|
| 1,000 min | $100–$160 | $80–$150 | $70–$150 | $90–$120 | $150–$250 | $180–$300 |
| 10,000 min | $1,000–$1,600 | $800–$1,500 | $700–$1,500 | $900–$1,200 | $1,500–$2,500 | $1,800–$3,000 |
| 50,000 min | $4,500–$7,000 | $3,500–$6,500 | $3,000–$6,000 | $4,500–$5,500 | $6,500–$10,000 | $7,500–$12,000 |
| 100,000 min | $8,000–$12,000 | $6,500–$12,000 | $5,500–$10,000 | $8,000–$9,500 | $12,000–$20,000 | $14,000–$25,000 |
Key observations from the volume projections:
- At low volume (1K–10K minutes): Bland AI and Vapi offer the most predictable and competitive pricing. Bland’s bundled approach eliminates multi-vendor billing complexity. Vapi’s flexibility lets you optimize each component but requires more management overhead.
- At mid volume (10K–50K minutes): Retell AI and Vapi become increasingly competitive as their per-minute rates hold steady. ElevenLabs remains viable if voice quality is the primary differentiator. Enterprise negotiations become available at most platforms.
- At high volume (50K–100K+ minutes): Enterprise pricing negotiations matter more than public rates. Amazon Connect’s per-second billing becomes advantageous for short calls. All platforms offer custom rates at this tier. The gap between the lowest and highest cost option can exceed $15,000/month.
- Hyperscaler premium: Amazon Connect and Google CCAI consistently cost 40–80% more than purpose-built voice AI platforms at equivalent volumes. The premium buys enterprise compliance certifications, global telephony infrastructure, and deep ecosystem integration—but if you do not need those, you are paying for capability you will not use.
Hidden Costs That Inflate Your Bill
Every voice AI platform has costs that are not immediately visible on the pricing page. These are the ones that consistently surprise teams after the first month of production deployment.
Provider Stacking
Platforms like Vapi and Retell that let you choose your own STT, LLM, and TTS providers give you flexibility but also create provider stacking costs. A seemingly cheap $0.05/min platform fee becomes $0.12–$0.15/min once you add Deepgram for STT, Claude or GPT-4o for reasoning, and ElevenLabs for TTS. Each provider has its own minimum charges, billing cycles, and overage rates.
Telephony and Phone Numbers
Unless you use Bland AI or Amazon Connect (which bundle telephony), you need a Twilio or equivalent account. Costs include:
- Phone numbers: $1–$2/month per number (local). Toll-free numbers cost $2–$5/month.
- Per-minute carrier charges: $0.0085–$0.013/min for US calls. International rates vary dramatically.
- SMS for verification/follow-up: $0.0079/message outbound in the US.
- Number porting: Often free but can take 2–4 weeks and cause temporary service disruption.
At 10,000 minutes/month, Twilio telephony alone adds $85–$130 to your bill before any AI processing costs.
Premium Voice and Custom Model Costs
Standard voices are included in base pricing, but premium capabilities cost extra:
- Professional voice cloning (ElevenLabs): Requires Creator tier ($22/month) minimum. High-quality instant voice cloning needs Pro ($99/month).
- Custom-trained models: Enterprise-tier features at all platforms. Typically requires Business or Enterprise plans starting at $330+/month.
- Neural/premium TTS voices (AWS Polly, Google TTS): 4x the cost of standard voices. A seemingly minor choice that quadruples one cost component.
- Low-latency models: Some platforms charge premium rates for low-latency inference. Vapi’s use of faster LLM models (GPT-4o-mini vs. GPT-4o) can halve the LLM cost component.
Support Tier Costs
Free and starter tiers include community support only. Production deployments typically need:
- Priority support: Usually included at Pro/Enterprise tiers ($99–$330+/month plans).
- Dedicated Slack/Discord channel: Enterprise tier at most platforms.
- SLA guarantees: Only available on Scale/Enterprise plans. If uptime matters for your business, you cannot use starter tiers in production.
- AWS/Google support: AWS Business Support starts at $100/month or 10% of monthly charges (whichever is higher). Google Cloud support has similar tiered pricing.
Development and Integration Costs
Often overlooked but substantial:
- Developer time for integration: Bland AI and ElevenLabs offer the quickest setup (hours to days). Vapi with custom providers requires more configuration (days to weeks). Amazon Connect and Google CCAI require significant engineering investment (weeks to months).
- Testing minutes: Development and QA consume billable minutes. Budget 500–2,000 minutes per month for testing during active development.
- Prompt engineering and tuning: LLM-powered voice agents require ongoing prompt optimization. Each iteration consumes billable API calls and developer time.
Budget Recommendations by Scale and Use Case
Solo Founders and Early-Stage Startups (Under 1,000 Minutes/Month)
Start with Bland AI or Vapi. Bland’s bundled pricing eliminates the complexity of managing multiple provider accounts while you validate your voice AI use case. Vapi gives more control if you have specific voice quality or LLM requirements. Budget $80–$150/month all-in. Avoid AWS and Google at this stage—the integration overhead alone will cost more than a year of Bland usage.
Growing Businesses (1,000–10,000 Minutes/Month)
Evaluate Retell AI and Vapi for their transparent per-minute breakdown and provider flexibility. If voice quality is your primary differentiator (luxury brands, premium services), consider ElevenLabs Conversational AI despite the higher per-minute cost. Budget $500–$1,500/month. Begin negotiating with your preferred platform for volume discounts—most offer custom rates above 5,000 minutes/month.
Mid-Market (10,000–50,000 Minutes/Month)
At this volume, enterprise negotiations become critical. Get custom quotes from at least three platforms. Retell AI and Vapi typically offer the most competitive negotiated rates. Bland AI’s simplicity becomes increasingly attractive as operational complexity costs real engineering time. Budget $3,000–$7,000/month. Assign someone to monitor usage and optimize provider selection quarterly.
Enterprise (50,000+ Minutes/Month)
Every platform offers custom enterprise pricing at this volume. Amazon Connect and Google CCAI become viable options if you need HIPAA compliance, FedRAMP authorization, or deep integration with existing cloud infrastructure. Their higher per-minute costs are offset by compliance certifications that purpose-built platforms may not have. Budget $5,000–$20,000+/month depending on provider and feature requirements. Negotiate annual commitments for 20–40% discounts.
Choosing an LLM for Voice Agent Intelligence
The LLM powering your voice agent’s reasoning directly affects both cost and conversation quality. Claude from Anthropic has emerged as a strong option for voice agent backends due to its instruction-following accuracy and nuanced conversation handling. For developers building and testing voice agent logic, AI-native code editors like Cursor can accelerate prompt engineering and integration development significantly.
Frequently Asked Questions
What is the cheapest voice AI platform for low-volume usage?
For under 1,000 minutes per month, Bland AI at $0.09/min connected offers the most predictable costs because telephony, STT, LLM, and TTS are bundled into a single rate. Vapi can be slightly cheaper ($0.08–$0.10/min total) if you optimize provider selection, but requires managing multiple vendor accounts. Both are significantly cheaper than AWS or Google at low volumes.
How does Vapi pricing work with external providers?
Vapi charges a $0.05/min platform fee for orchestrating your voice agent. On top of that, you pay each provider (STT, LLM, TTS) at their standard API rates. Vapi passes these costs through without markup. A typical configuration using Deepgram STT, Claude or GPT-4o for reasoning, and ElevenLabs for TTS totals $0.08–$0.15/min depending on conversation complexity and model selection. You receive separate bills from Vapi and each provider.
Is Amazon Connect cost-effective for voice AI?
Amazon Connect is cost-effective at scale (50,000+ minutes/month) for organizations that need enterprise compliance certifications (HIPAA, PCI DSS, SOC 2) and are already in the AWS ecosystem. At low volumes, the per-minute cost ($0.15–$0.25) is 40–80% higher than purpose-built platforms, and the engineering investment to set up Connect + Lex + Polly + Bedrock is substantial. The per-second billing model does save money on short calls compared to platforms that bill per minute.
What hidden costs should I budget for with voice AI?
The most commonly overlooked costs are: telephony (Twilio numbers and per-minute carrier fees adding $85–$130/month at 10K minutes), premium voice models (neural voices cost 4x standard on AWS/Google), testing minutes during development (500–2,000 minutes/month), and support tier upgrades needed for production SLAs. Budget 20–35% above your projected per-minute costs to account for these.
How much does ElevenLabs Conversational AI cost per minute?
ElevenLabs Conversational AI costs $0.08–$0.12 per minute depending on your subscription tier and voice model selection. This rate bundles STT, LLM orchestration, and TTS. If you bring your own LLM instead of using ElevenLabs’ hosted option, you pay your LLM provider separately, which can add $0.01–$0.04/min depending on the model. Higher subscription tiers (Scale at $330/month, Business custom) offer lower per-minute rates and larger included minute allocations.
Can I switch voice AI platforms without rebuilding my agent?
Partially. Platforms like Vapi and Retell AI that use modular provider architectures make it relatively easy to swap individual components (change TTS from ElevenLabs to Play.ai, for example) without rebuilding the entire agent. Switching between orchestration platforms (Vapi to Bland, or Retell to Amazon Connect) requires more significant rework because each platform has its own conversation design paradigm, webhook structure, and tool-calling interface. Plan for 2–6 weeks of engineering time for a full platform migration at production scale.
Disclosure: This article contains affiliate links. If you purchase through these links, we may earn a commission at no additional cost to you. We only recommend tools we believe provide genuine value. All pricing data is sourced from official vendor documentation and may change—verify current rates before making purchasing decisions.