Current AI API Pricing March 2026: OpenAI, Grok, Anthropic, Gemini

Use this when you are evaluating AI API providers and need current pricing in a consistent format — or when you need to translate pricing into a rough monthly cost estimate for a specific workload.

The fast answer: most AI APIs charge per million tokens, split into input (the prompt you send) and output (the response you receive). Output tokens are priced higher than input tokens. The cheapest option for your workload depends on your prompt shape, not just the headline rate. A mid-tier model with short prompts can cost less than a cheap model with long prompts. This guide gives you the current numbers and a section on how to calculate what pricing actually means for your specific usage.

You're evaluating four AI APIs. You open four pricing pages. One charges by model family. One jumps to higher rates past 200K context. One bundles usage into seats and credits. None of them use the same units.

This guide does that work for you. All major AI API providers, same format, updated for March 2026. If you want to compare what these choices mean after deployment, see AI cost monitoring.

What changed in this update

Tightened this guide for high-intent "current pricing" queries for March 2026.
Added provider-specific API pricing headings and a FAQ for common pricing/official-page searches.
Refreshed review metadata and source-check date for this snapshot.

How to Read AI API Pricing

Most AI APIs charge per token. A token is roughly 0.75 words in English — a 1,000-word document is approximately 1,300 tokens. Pricing is quoted per 1 million tokens, split into input (the prompt you send) and output (the response the model generates).

Why input and output are priced differently: Output tokens cost more to generate than input tokens are to process. For most chat-style tasks, output tokens represent 20–40% of total token volume but a higher share of cost. For long-context retrieval tasks (RAG), input tokens dominate.

Why context window size matters for cost: Some providers charge more for larger context windows — even at the same model tier. Sending a 200,000-token document to a model costs significantly more than sending a 2,000-token prompt, both in tokens consumed and sometimes in per-token rate. If you also need one view across infrastructure and model spend, see cloud + AI cost monitoring.

If you already know your workload shape, skip next to the more specific follow-on guides: cheapest AI API for chat, RAG, and coding, OpenAI vs Anthropic pricing, or long-context pricing above 200K tokens.

How to use this guide to make a decision

The pricing tables below tell you the list rate. To turn that into a cost estimate for your workload, you need three numbers from your own product:

Average input tokens per request — how long is your prompt including system prompt, any retrieved context, and user input?
Average output tokens per request — how long is the model's typical response?
Requests per month — how many calls will this workflow make at your expected usage level?

Then apply:

Monthly cost = (input_tokens/1,000,000 × input_rate)
             + (output_tokens/1,000,000 × output_rate)
             × requests_per_month

A worked example: A team is building a content moderation classifier. Each request has a 400-token input (content + system prompt) and a 50-token output (a classification label + brief reasoning). They expect 2 million requests per month.

Comparing two candidates:

Model	Input ($/1M)	Output ($/1M)	Monthly cost
GPT-5 Mini	$0.25	$2.00	(0.4 × $0.25 + 0.05 × $2.00) × 2,000 = $400
Claude Haiku 4.5	$1.00	$5.00	(0.4 × $1.00 + 0.05 × $5.00) × 2,000 = $1,300

At this workload shape, GPT-5 Mini is 3.25x cheaper than Claude Haiku for the same task. If quality is comparable, that difference compounds quickly at scale.

The important insight is that "cheapest model" and "cheapest for this workload" are not the same question. A model with slightly higher input rates but much lower output rates can be cheaper for long-response workflows. Check both sides of the rate before deciding.

OpenAI API Pricing (March 2026)

OpenAI's default lineup has shifted from GPT-4o-era naming to GPT-5.x for general text workloads.

GPT-5.2 — $1.75 per 1M input tokens, $14.00 per 1M output tokens. Cached input: $0.175/1M.

GPT-5.2 Pro — $21.00 per 1M input tokens, $168.00 per 1M output tokens. Premium tier for highest-complexity tasks.

GPT-5 Mini — $0.25 per 1M input tokens, $2.00 per 1M output tokens. Cached input: $0.025/1M.

Cost controls worth using: Batch API gives 50% off input and output for async workloads, and cached prompt pricing materially changes costs for repeated long-context prompts. If OpenAI is a meaningful part of your bill, OpenAI cost monitoring helps you track those changes over time instead of only checking pricing tables.

Anthropic API Pricing (March 2026)

Anthropic's current family centers on Claude 4.x model tiers.

Claude Opus 4.6 — $5.00 per 1M input tokens, $25.00 per 1M output tokens.

Claude Sonnet 4.6 — $3.00 per 1M input tokens, $15.00 per 1M output tokens.

Claude Haiku 4.5 — $1.00 per 1M input tokens, $5.00 per 1M output tokens.

Important pricing behavior: for supported 1M-context models, requests over 200K input tokens move to higher long-context rates. Batch API pricing remains 50% lower than standard list rates.

If Claude is one of your top two candidates, OpenAI vs Anthropic pricing in 2026 goes deeper on when the higher list price is still the better value.

Google Gemini API Pricing (March 2026)

Google's production pricing now primarily uses Gemini 2.5, with Gemini 3.x available in preview tiers.

Gemini 2.5 Pro — $1.25 per 1M input tokens, $10.00 per 1M output tokens (standard pricing; higher rates apply above 200K input context).

Gemini 2.5 Flash — $0.30 per 1M input tokens, $2.50 per 1M output tokens.

Gemini 2.5 Flash Lite — $0.10 per 1M input tokens, $0.40 per 1M output tokens.

Gemini 3.1 Pro (preview on Vertex AI) — $2.00 per 1M input tokens, $12.00 per 1M output tokens at standard rates.

Context threshold still matters: several Gemini SKUs increase pricing beyond 200K input tokens, so long-context prompting can double costs quickly.

For products built around large documents or retrieval-heavy prompts, read Long-Context AI Pricing in 2026 before you commit to a model.

AWS Bedrock and Amazon Nova API Pricing (March 2026)

AWS Bedrock is different from single-vendor model APIs: it is a model platform with multiple providers and service tiers.

For Amazon's first-party family, Amazon Nova is the key lineup to watch:

Nova Micro (text-focused, low-latency/cost tier)
Nova Lite (low-cost multimodal tier)
Nova Pro (higher-capability multimodal tier)
Nova Premier (most capable tier for complex reasoning/distillation workflows)

How pricing works on Bedrock/Nova:

On-demand token pricing (input and output) via Bedrock pricing matrix
Service tiers: Standard, Priority, and Flex
Batch inference support for selected models
Region and access path affect final effective rates

Because Bedrock pricing is published in a dynamic matrix and can vary by region/tier, treat Nova pricing as a configuration exercise (model + region + tier), not a single global number.

If you are choosing between managed platforms rather than direct vendor APIs, Bedrock vs Vertex AI pricing: what teams actually pay is the better next read.

xAI Grok API Pricing (March 2026)

xAI's newer lineup is split between Grok 4.x premium models and fast/mini options.

grok-4-1-fast-reasoning — $0.20 per 1M input tokens, $0.50 per 1M output tokens (2M context).

grok-code-fast-1 — $0.20 per 1M input tokens, $1.50 per 1M output tokens (256K context).

grok-4-0709 — $3.00 per 1M input tokens, $15.00 per 1M output tokens.

grok-3-mini — $0.30 per 1M input tokens, $0.50 per 1M output tokens.

Extra costs to watch: xAI tool calls (web search, code execution, X search) are billed separately from token charges, and access/availability can vary by model tier and account context.

Mistral API Pricing (March 2026)

Mistral's model lineup has moved to newer generations (Large 3, Medium 3, Small 3.x, Nemo). Their public pricing page is dynamic, and rates can change per deployment path (La Plateforme, cloud partners, or self-hosted).

Commonly reported API rates (check current checkout before production commitments):

Mistral Large 3 — around $2.00 input / $6.00 output per 1M tokens
Mistral Small 3.x — around $0.06-$0.20 input / $0.18-$0.60 output per 1M tokens (variant-dependent)
Mistral Nemo — often listed in the low-cost tier (single-digit cents per 1M tokens)

If you're cost-optimizing aggressively, validate the exact SKU and deployment region at purchase time, not from cached comparison pages.

Cursor Pricing (March 2026)

Cursor moved from the older "fast requests" messaging to plan tiers with usage multipliers and credits.

Hobby — Free. Limited Agent requests and limited Tab completions.

Pro — $20/month. Extended Agent limits, unlimited Tab, cloud agents, max context windows.

Pro+ — $60/month. Higher included usage (3x on OpenAI/Claude/Gemini model usage envelope).

Ultra — $200/month. Highest usage envelope (20x) and priority feature access.

Teams — $40/user/month. Centralized billing, shared assets, usage analytics/reporting, RBAC, and SAML/OIDC SSO.

The practical implication: for individual heavy users, Pro+ or Ultra is now the realistic comparison, not only Pro. For engineering teams that need seat and usage visibility in the same place, see Cursor cost monitoring.

Hugging Face API and Endpoint Pricing (March 2026)

Hugging Face now emphasizes Inference Providers routing with transparent pass-through billing.

Monthly credits:

Free users: $0.10/month
PRO users: $2.00/month
Team/Enterprise: $2.00 per seat/month

After credits, usage is pay-as-you-go, and HF states no markup over provider pricing for routed calls.

Inference Endpoints (dedicated): hourly compute billing by instance type. Typical range starts around $0.033/hour for small CPU and extends upward for GPU tiers (for example, A10G at $1.00/hour for 1x on listed AWS configs). If you use Hugging Face alongside direct model vendors, Hugging Face cost monitoring gives you the same daily view across providers.

If you are deciding whether to route through Hugging Face or call providers directly, read Hugging Face vs Direct Provider APIs: Cost Trade-offs in 2026.

Master Comparison Table (March 2026 Snapshot)

Model	Input ($/1M)	Output ($/1M)	Context Window	Notes
GPT-5.2	$1.75	$14.00	provider-dependent	Cached input available
GPT-5 Mini	$0.25	$2.00	provider-dependent	Low-cost OpenAI tier
Claude Sonnet 4.6	$3.00	$15.00	up to 1M (tiered)	Long-context premium above 200K
Claude Haiku 4.5	$1.00	$5.00	200K	Fast, lower-cost Claude tier
Gemini 2.5 Pro	$1.25	$10.00	up to 1M (tiered)	Higher rates above 200K input
Gemini 2.5 Flash	$0.30	$2.50	large context	Strong throughput/cost balance
Gemini 2.5 Flash Lite	$0.10	$0.40	large context	Cheapest Gemini text tier
Amazon Nova (Bedrock)	varies by model/tier	varies by model/tier	Micro 128K, Lite/Pro up to 300K, Premier up to 1M	Region and service tier impact price
grok-4-1-fast-reasoning	$0.20	$0.50	2M	Separate tool-call charges apply
grok-4-0709	$3.00	$15.00	256K	Premium Grok tier
Mistral Large 3*	~$2.00	~$6.00	~128K	Dynamic pricing page
Mistral Small 3.x*	~$0.06-$0.20	~$0.18-$0.60	variant-dependent	Confirm exact SKU rate

Mistral rates are included as market-quoted ranges where static official docs tables are not currently published in a crawlable format.

What This Costs in Practice (Updated)

A typical product team using AI across a few different tasks:

GPT-5 Mini for app chat — 500,000 input tokens/day, 150,000 output tokens/day -> ~$0.13/day input + ~$0.30/day output -> ~$13/month
Gemini 2.5 Flash for high-volume classification — 5M input/day, 500K output/day -> ~$1.50/day input + ~$1.25/day output -> ~$83/month
Cursor Teams for 10 developers -> $400/month
Hugging Face Endpoint (small CPU) always-on baseline -> ~$24/month at ~$0.033/hour

Combined estimate: ~$520/month before overages, tool calls, or long-context premiums. The important operational point is not the exact number, but how quickly the combined bill becomes hard to track once several providers are live. That is where AI cost monitoring or a broader cloud + AI cost monitoring layer becomes useful.

If you are budgeting for a small company rather than a single product workflow, How Much AI API Spend Should a Startup Expect Per Month? gives a better planning lens.

Related decisions

Readers who land here usually branch into one of these next:

FAQ (March 2026 pricing queries)

What is the current xAI Grok API pricing in March 2026?

In this March 2026 snapshot, grok-4-1-fast-reasoning is listed at $0.20 per 1M input tokens and $0.50 per 1M output tokens, while grok-4-0709 is listed at $3.00 input and $15.00 output per 1M tokens. xAI tool calls are billed separately from token usage.

What is the current OpenAI API pricing in March 2026?

In this guide's March 2026 snapshot, GPT-5.2 is listed at $1.75 input and $14.00 output per 1M tokens, and GPT-5 Mini is listed at $0.25 input and $2.00 output. Batch API and cached input pricing can materially reduce effective cost.

What is the current Anthropic API pricing in March 2026?

This snapshot lists Claude Opus 4.6 at $5.00 input / $25.00 output, Claude Sonnet 4.6 at $3.00 / $15.00, and Claude Haiku 4.5 at $1.00 / $5.00 per 1M tokens, with long-context premiums above 200K input tokens on supported tiers.

Where is the official OpenAI API pricing page?

Use the official OpenAI pricing pages directly:

Where is the official xAI Grok API pricing page?

Use the official xAI models/pricing documentation:

xAI Models and Pricing

Tracking AI API Spend Across Providers

The challenge with multi-provider AI usage isn't any single bill — it's the aggregate. When OpenAI, Anthropic, Cursor, and Hugging Face all invoice separately, on different cycles, with different units, the combined picture doesn't exist unless you build it.

Connect all your AI providers to StackSpend for a single view of total AI API spend, daily anomaly detection (with webhooks to push alerts to your systems), and pace-to-forecast alerts. Setup guides: OpenAI, Anthropic, Cursor, Hugging Face, GCP (Gemini via Vertex).