How to Track LLM API Spend Across OpenAI, Anthropic, Bedrock, Vertex, and Azure OpenAI

Use this when your team is using more than one AI provider and you cannot explain last month's variance — or when you need to set up a tracking system before usage grows too large to manage.

The fast answer: pull provider data daily, normalize it into a shared schema with consistent dimensions (provider, model, feature, environment, team), track input and output tokens separately, and alert on daily anomalies rather than waiting for month-end invoices. The hard part is not collecting a bill total — it is understanding which provider, model, and workflow caused that total to move. This guide gives you a working structure for that.

If you also need a unified view after rollout, see AI cost monitoring and cloud + AI cost monitoring.

A scenario that makes the stakes clear

A 12-person product team uses OpenAI for their customer-facing assistant and AWS Bedrock for a background classification pipeline. Both bills arrived last month. OpenAI was $4,200 — up $900 from the month before. Bedrock was $1,800 — up $400.

The CTO wants to know why it jumped $1,300. The engineering lead opens the OpenAI dashboard, sees total token usage by model. Opens the AWS Cost Explorer, sees Bedrock spend by region. Both views are useful. Neither answers the question: which feature, workflow, or customer segment caused the increase?

Without a normalized tracking layer, the answer is "we're not sure — could be the assistant, could be the classification jobs, could be a prompt change." With one, the answer is: "The assistant cost grew $600 because we enabled it for a new customer tier on the 12th. The classification pipeline grew $700 because we switched from Nova Lite to Nova Pro for accuracy. Both were expected." That 20-minute investigation becomes a 2-minute look-up.

The rest of this guide shows you how to build the layer that makes that possible.

Quick answer: how do you track LLM API spend properly?

If you need to know how to track LLM usage in production or monitor LLM costs across providers, a good setup does five things:

Pulls usage or cost data from the provider source of truth every day.
Normalizes all providers into the same dimensions: provider, model, project, feature, environment, team, and customer.
Separates volume changes from price changes, so you can tell whether more requests or a more expensive model caused the increase.
Alerts on daily pace and anomalies instead of waiting for month-end invoices.
Keeps cloud-routed model usage visible too, because Bedrock, Vertex, and Azure OpenAI often disappear into broader cloud billing unless you tag them carefully.

That means OpenAI and Anthropic are only half the picture. Many teams also need to track AWS Bedrock, Vertex AI, or Azure OpenAI inside the same reporting layer. If you are specifically building an ownership and dashboard layer around that data, see AI cost observability.

Why LLM API spend is harder to track than it looks

LLM API spend becomes messy fast because providers bill differently and products use models in different ways.

OpenAI and Anthropic give you direct API usage and cost data, but only if you structure projects, API keys, and filters clearly.
Bedrock, Vertex, and Azure OpenAI often land inside cloud billing, where model spend is mixed with storage, networking, and other infrastructure.
One feature might call multiple models in one user flow: embeddings, reranking, chat generation, and fallback routing.
Product teams talk about features and customers, while finance sees invoices and engineering sees requests.

That is why a raw provider dashboard is not enough. It tells you what the provider billed. It usually does not tell you what changed in your product.

What data should you capture on every LLM request?

At minimum, each request needs enough metadata to answer, "what was this for?"

Field	Why it matters	Good default
Provider	Lets you compare OpenAI, Anthropic, Bedrock, Vertex, and Azure OpenAI in one view	Always capture explicitly
Model	Separates volume growth from routing to a more expensive tier	Store the exact model name returned by the provider
Feature or workflow	Connects spend to product decisions	Use a stable feature key, not free-form labels
Project or environment	Separates production from staging and internal tools	At least prod, staging, dev
Team or owner	Makes reviews and accountability possible	Route via service or cost owner
Customer or tenant	Supports margin analysis and enterprise debugging	Track customer ID where relevant
Input and output tokens	Explains whether cost changed because prompts or responses got larger	Store both, not only total tokens
Request outcome	Shows whether retries, errors, or loops are inflating usage	Success, retry, failure, timeout

If you want finance and product teams to use the same numbers, do not stop at provider and model. Add ownership dimensions early. How to attribute AI costs by feature, team, and customer goes deeper on that part.

How do you track direct-provider LLM spend from OpenAI and Anthropic?

For direct APIs, the cleanest pattern is to use the provider's own usage and cost endpoints as your source of truth, then enrich that data with your internal metadata.

OpenAI

As of March 2026, OpenAI provides both organization usage and cost reporting endpoints, plus dashboard exports for CSV workflows. That makes OpenAI one of the easier providers to track programmatically, provided you use project structure and API keys consistently.

Practical recommendation:

assign API keys or project boundaries to major product surfaces,
capture internal feature and customer metadata in your app logs,
and reconcile provider-side daily cost with your internal request-level view.

If you use Batch or cached prompts, watch those separately. They change unit economics materially even when request counts stay flat. See AI API pricing in 2026 if you need the pricing-side context.

Anthropic

Anthropic's Usage and Cost Admin API is strong for organizations that need workspace, model, and service-tier views. The catch is that you need an Admin API key, and that is an organizational workflow decision, not just an engineering one.

Practical recommendation:

use the Admin API for finance-grade reporting,
keep model-level trend views separate from feature attribution,
and watch long-context usage explicitly because token volume can jump without a visible traffic increase.

For both providers, the key mistake is treating raw provider data as your final dashboard. Provider data should be the billing source of truth, but your internal metadata should explain why the bill moved.

How do you track Bedrock, Vertex AI, and Azure OpenAI spend?

Cloud-routed LLM spend is where many teams lose visibility.

AWS Bedrock

Bedrock pricing and billing live inside AWS. If you only look at total AWS spend, Bedrock often gets buried inside a broader cloud bill. The practical fix is to use application inference profiles, cost allocation tags, and a reporting layer that can isolate generative AI usage from the rest of AWS.

If you run multiple teams, products, or tenants on Bedrock, this is worth doing early. Otherwise, you end up arguing over one undifferentiated AWS line item.

Google Vertex AI

Vertex AI usage usually becomes operationally useful only after Cloud Billing export is enabled to BigQuery. Without that export, you can inspect the console, but it is much harder to build recurring reporting and cross-team views. That is the same pattern described in GCP billing export pitfalls that break cost visibility, just applied to model workloads.

Azure OpenAI

Azure OpenAI tracking usually depends on Azure cost management views plus disciplined subscription, resource group, and tagging structure. If different environments or teams share the same Azure footprint without good labeling, spend reviews get fuzzy quickly.

This is the big cross-provider rule: direct APIs expose model spend directly, but cloud-routed LLMs often need billing exports, tags, or cost allocation structure before the numbers become actionable.

What should an LLM spend dashboard show every day?

A useful daily view is not just "total spend so far this month." It should answer the questions operators actually ask when something moves.

View	What it should answer	Why it matters
Total daily spend	Did yesterday look normal?	Fast anomaly detection
Spend by provider	Which vendor moved?	Find the source quickly
Spend by model	Did routing or model choice change?	Separates volume from pricing tier changes
Spend by feature	Which workflow is driving cost?	Lets product teams act
Spend by customer or tenant	Is one account skewing usage?	Supports margin and support workflows
Month-end forecast	Where is current pace heading?	Prevents invoice surprises

If your dashboard cannot explain a change within a few clicks, it is a reporting artifact, not an operating tool.

What alerts actually help before the invoice arrives?

The first alert should usually not be "monthly budget exceeded." That arrives too late.

Better default alerts:

daily spend above expected range,
sudden increase in one provider or one model,
feature-level cost spike,
prompt size jump,
retry or failure surge,
or forecasted month-end overspend.

This is the operational sequence that works best:

Alert on daily anomaly.
Check provider, model, and feature deltas.
Confirm whether the change is still active.
Fix the route, prompt, retry loop, or usage cap.

If you need concrete thresholds, use How to set AI and cloud alert thresholds. If an alert already fired, use How to investigate an AI spend spike.

When is a spreadsheet enough, and when do you need a monitoring layer?

For very small teams, a spreadsheet can work for a while. But only under narrow conditions.

Setup	Usually fine with spreadsheets	Usually needs a monitoring layer
One provider, one product, one team	Yes, if spend is low and changes are infrequent	When daily movement starts to matter
Two or more providers	Only briefly	Yes, because aggregation becomes manual and slow
Need feature or customer attribution	No	Yes
Need alerts before month-end	No	Yes
Bedrock, Vertex, or Azure OpenAI inside cloud billing	Rarely	Yes, unless billing exports and tags are already excellent

The rule of thumb is simple: if your spend review depends on exporting CSVs from more than one place, you are already past spreadsheet scale.

A practical implementation pattern that works

If you need one concrete recommendation, use this:

Start with provider daily cost as the billing source of truth.
Add request-level metadata for feature, environment, team, and customer.
Normalize all providers into one schema.
Track input tokens, output tokens, and request counts separately.
Review daily anomalies and weekly trends.
Forecast month-end spend from current pace.
Investigate every spike by provider, model, and feature before trying broad cost optimization.

This is also why the "gateway vs direct API" decision matters. A gateway can improve consistency if it becomes the place where metadata, routing, and observability are enforced. If that decision is still open, read Direct provider API vs AI gateway: which should you use?.

FAQ

What is the difference between LLM usage tracking and LLM spend tracking?

Usage tracking tells you requests, tokens, and model activity. Spend tracking tells you what those requests cost. You need both. Usage explains behavior. Spend explains impact.

Can I track LLM API spend by feature?

Yes, but only if your app attaches a stable feature key to each request or to the API key, project, or routing context that generated it. Otherwise you can see provider totals, but not product-level drivers.

How do I track LLM spend when it runs through AWS Bedrock or Vertex AI?

Treat cloud billing as part of the source of truth. For Bedrock, use inference profiles and cost allocation tags where possible. For Vertex AI, enable Cloud Billing export to BigQuery. The key is to separate model costs from the rest of cloud spend.

Should I trust provider dashboards or my own internal telemetry?

Use provider data as the billing source of truth and internal telemetry as the explanation layer. If they disagree, reconcile to provider cost first, then debug your metadata pipeline.

How often should I refresh LLM spend data?

Daily is the minimum useful cadence for most production teams. High-volume or high-risk systems may justify more frequent refreshes, but daily is the level where you can still catch problems before month-end.

What is the first alert I should set?

Set an alert for abnormal daily spend or abnormal provider/model movement, not just a month-end budget breach. Daily pace alerts give you time to respond.

Is one provider dashboard enough if I mostly use OpenAI?

Only if OpenAI is truly the only source of model spend and you do not need feature, customer, or forecast views. As soon as Anthropic, Cursor, Bedrock, Vertex, or Azure OpenAI enter the picture, you need a cross-provider view.

How do I monitor API spend across multiple providers?

Monitor LLM costs across providers by pulling daily cost or usage data from each provider (OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI), normalizing dimensions (provider, model, project, team), and reviewing totals and trends in one dashboard. API billing analytics tools that support multiple providers let you compare spend and detect anomalies before the invoice.

How do I track LLM usage in production?

To track LLM usage in production, capture provider, model, feature, and customer metadata on each request, pull daily usage or cost data from provider APIs, and join it with your metadata in a reporting layer. Use projects (OpenAI), workspaces (Anthropic), or cloud tags (Bedrock, Vertex) for provider-level grouping. Production tracking works best when you establish the attribution model before scale.

Practical takeaway

Tracking LLM API spend well means building one shared view across providers, models, and categories. The billing source of truth should come from the providers. The explanation layer should come from your own product metadata. Once you combine those, alerts and forecasts become much more useful than end-of-month invoice review.

How StackSpend helps

StackSpend gives you the cross-provider layer that normalizes spend from OpenAI, Anthropic, AWS Bedrock, GCP Vertex, Azure OpenAI, Cursor, and others into a single daily view organized by provider, model, service, and category. That is the foundation the tracking pattern in this guide requires. Provider setup guides: OpenAI, Anthropic, AWS, GCP, Azure.