FoundationsTrack and understand costsModule 2 of 3
Guides
March 8, 2026
By Andrew Day

How to Track LLM API Spend Across OpenAI, Anthropic, Bedrock, Vertex, and Azure OpenAI

A practical guide for developers, product teams, and engineering leaders who need to track LLM API spend by provider, model, feature, team, and customer before the invoice arrives.

Share this post

Send it to someone managing cloud or AI spend.

LinkedInX

Use this when your team is using more than one AI provider and you cannot explain last month's variance — or when you need to set up a tracking system before usage grows too large to manage.

The fast answer: pull provider data daily, normalize it into a shared schema with consistent dimensions (provider, model, feature, environment, team), track input and output tokens separately, and alert on daily anomalies rather than waiting for month-end invoices. The hard part is not collecting a bill total — it is understanding which provider, model, and workflow caused that total to move. This guide gives you a working structure for that.

If you also need a unified view after rollout, see AI cost monitoring and cloud + AI cost monitoring.

A scenario that makes the stakes clear

A 12-person product team uses OpenAI for their customer-facing assistant and AWS Bedrock for a background classification pipeline. Both bills arrived last month. OpenAI was $4,200 — up $900 from the month before. Bedrock was $1,800 — up $400.

The CTO wants to know why it jumped $1,300. The engineering lead opens the OpenAI dashboard, sees total token usage by model. Opens the AWS Cost Explorer, sees Bedrock spend by region. Both views are useful. Neither answers the question: which feature, workflow, or customer segment caused the increase?

Without a normalized tracking layer, the answer is "we're not sure — could be the assistant, could be the classification jobs, could be a prompt change." With one, the answer is: "The assistant cost grew $600 because we enabled it for a new customer tier on the 12th. The classification pipeline grew $700 because we switched from Nova Lite to Nova Pro for accuracy. Both were expected." That 20-minute investigation becomes a 2-minute look-up.

The rest of this guide shows you how to build the layer that makes that possible.

Quick answer: how do you track LLM API spend properly?

If you need to know how to track LLM usage in production or monitor LLM costs across providers, a good setup does five things:

  1. Pulls usage or cost data from the provider source of truth every day.
  2. Normalizes all providers into the same dimensions: provider, model, project, feature, environment, team, and customer.
  3. Separates volume changes from price changes, so you can tell whether more requests or a more expensive model caused the increase.
  4. Alerts on daily pace and anomalies instead of waiting for month-end invoices.
  5. Keeps cloud-routed model usage visible too, because Bedrock, Vertex, and Azure OpenAI often disappear into broader cloud billing unless you tag them carefully.

That means OpenAI and Anthropic are only half the picture. Many teams also need to track AWS Bedrock, Vertex AI, or Azure OpenAI inside the same reporting layer. If you are specifically building an ownership and dashboard layer around that data, see AI cost observability.

Why LLM API spend is harder to track than it looks

LLM API spend becomes messy fast because providers bill differently and products use models in different ways.

  • OpenAI and Anthropic give you direct API usage and cost data, but only if you structure projects, API keys, and filters clearly.
  • Bedrock, Vertex, and Azure OpenAI often land inside cloud billing, where model spend is mixed with storage, networking, and other infrastructure.
  • One feature might call multiple models in one user flow: embeddings, reranking, chat generation, and fallback routing.
  • Product teams talk about features and customers, while finance sees invoices and engineering sees requests.

That is why a raw provider dashboard is not enough. It tells you what the provider billed. It usually does not tell you what changed in your product.

What data should you capture on every LLM request?

At minimum, each request needs enough metadata to answer, "what was this for?"

Field Why it matters Good default
Provider Lets you compare OpenAI, Anthropic, Bedrock, Vertex, and Azure OpenAI in one view Always capture explicitly
Model Separates volume growth from routing to a more expensive tier Store the exact model name returned by the provider
Feature or workflow Connects spend to product decisions Use a stable feature key, not free-form labels
Project or environment Separates production from staging and internal tools At least prod, staging, dev
Team or owner Makes reviews and accountability possible Route via service or cost owner
Customer or tenant Supports margin analysis and enterprise debugging Track customer ID where relevant
Input and output tokens Explains whether cost changed because prompts or responses got larger Store both, not only total tokens
Request outcome Shows whether retries, errors, or loops are inflating usage Success, retry, failure, timeout

If you want finance and product teams to use the same numbers, do not stop at provider and model. Add ownership dimensions early. How to attribute AI costs by feature, team, and customer goes deeper on that part.

How do you track direct-provider LLM spend from OpenAI and Anthropic?

For direct APIs, the cleanest pattern is to use the provider's own usage and cost endpoints as your source of truth, then enrich that data with your internal metadata.

OpenAI

As of March 2026, OpenAI provides both organization usage and cost reporting endpoints, plus dashboard exports for CSV workflows. That makes OpenAI one of the easier providers to track programmatically, provided you use project structure and API keys consistently.

Practical recommendation:

  • assign API keys or project boundaries to major product surfaces,
  • capture internal feature and customer metadata in your app logs,
  • and reconcile provider-side daily cost with your internal request-level view.

If you use Batch or cached prompts, watch those separately. They change unit economics materially even when request counts stay flat. See AI API pricing in 2026 if you need the pricing-side context.

Anthropic

Anthropic's Usage and Cost Admin API is strong for organizations that need workspace, model, and service-tier views. The catch is that you need an Admin API key, and that is an organizational workflow decision, not just an engineering one.

Practical recommendation:

  • use the Admin API for finance-grade reporting,
  • keep model-level trend views separate from feature attribution,
  • and watch long-context usage explicitly because token volume can jump without a visible traffic increase.

For both providers, the key mistake is treating raw provider data as your final dashboard. Provider data should be the billing source of truth, but your internal metadata should explain why the bill moved.

How do you track Bedrock, Vertex AI, and Azure OpenAI spend?

Cloud-routed LLM spend is where many teams lose visibility.

AWS Bedrock

Bedrock pricing and billing live inside AWS. If you only look at total AWS spend, Bedrock often gets buried inside a broader cloud bill. The practical fix is to use application inference profiles, cost allocation tags, and a reporting layer that can isolate generative AI usage from the rest of AWS.

If you run multiple teams, products, or tenants on Bedrock, this is worth doing early. Otherwise, you end up arguing over one undifferentiated AWS line item.

Google Vertex AI

Vertex AI usage usually becomes operationally useful only after Cloud Billing export is enabled to BigQuery. Without that export, you can inspect the console, but it is much harder to build recurring reporting and cross-team views. That is the same pattern described in GCP billing export pitfalls that break cost visibility, just applied to model workloads.

Azure OpenAI

Azure OpenAI tracking usually depends on Azure cost management views plus disciplined subscription, resource group, and tagging structure. If different environments or teams share the same Azure footprint without good labeling, spend reviews get fuzzy quickly.

This is the big cross-provider rule: direct APIs expose model spend directly, but cloud-routed LLMs often need billing exports, tags, or cost allocation structure before the numbers become actionable.

What should an LLM spend dashboard show every day?

A useful daily view is not just "total spend so far this month." It should answer the questions operators actually ask when something moves.

View What it should answer Why it matters
Total daily spend Did yesterday look normal? Fast anomaly detection
Spend by provider Which vendor moved? Find the source quickly
Spend by model Did routing or model choice change? Separates volume from pricing tier changes
Spend by feature Which workflow is driving cost? Lets product teams act
Spend by customer or tenant Is one account skewing usage? Supports margin and support workflows
Month-end forecast Where is current pace heading? Prevents invoice surprises

If your dashboard cannot explain a change within a few clicks, it is a reporting artifact, not an operating tool.

What alerts actually help before the invoice arrives?

The first alert should usually not be "monthly budget exceeded." That arrives too late.

Better default alerts:

  • daily spend above expected range,
  • sudden increase in one provider or one model,
  • feature-level cost spike,
  • prompt size jump,
  • retry or failure surge,
  • or forecasted month-end overspend.

This is the operational sequence that works best:

  1. Alert on daily anomaly.
  2. Check provider, model, and feature deltas.
  3. Confirm whether the change is still active.
  4. Fix the route, prompt, retry loop, or usage cap.

If you need concrete thresholds, use How to set AI and cloud alert thresholds. If an alert already fired, use How to investigate an AI spend spike.

When is a spreadsheet enough, and when do you need a monitoring layer?

For very small teams, a spreadsheet can work for a while. But only under narrow conditions.

Setup Usually fine with spreadsheets Usually needs a monitoring layer
One provider, one product, one team Yes, if spend is low and changes are infrequent When daily movement starts to matter
Two or more providers Only briefly Yes, because aggregation becomes manual and slow
Need feature or customer attribution No Yes
Need alerts before month-end No Yes
Bedrock, Vertex, or Azure OpenAI inside cloud billing Rarely Yes, unless billing exports and tags are already excellent

The rule of thumb is simple: if your spend review depends on exporting CSVs from more than one place, you are already past spreadsheet scale.

A practical implementation pattern that works

If you need one concrete recommendation, use this:

  1. Start with provider daily cost as the billing source of truth.
  2. Add request-level metadata for feature, environment, team, and customer.
  3. Normalize all providers into one schema.
  4. Track input tokens, output tokens, and request counts separately.
  5. Review daily anomalies and weekly trends.
  6. Forecast month-end spend from current pace.
  7. Investigate every spike by provider, model, and feature before trying broad cost optimization.

This is also why the "gateway vs direct API" decision matters. A gateway can improve consistency if it becomes the place where metadata, routing, and observability are enforced. If that decision is still open, read Direct provider API vs AI gateway: which should you use?.

FAQ

What is the difference between LLM usage tracking and LLM spend tracking?

Usage tracking tells you requests, tokens, and model activity. Spend tracking tells you what those requests cost. You need both. Usage explains behavior. Spend explains impact.

Can I track LLM API spend by feature?

Yes, but only if your app attaches a stable feature key to each request or to the API key, project, or routing context that generated it. Otherwise you can see provider totals, but not product-level drivers.

How do I track LLM spend when it runs through AWS Bedrock or Vertex AI?

Treat cloud billing as part of the source of truth. For Bedrock, use inference profiles and cost allocation tags where possible. For Vertex AI, enable Cloud Billing export to BigQuery. The key is to separate model costs from the rest of cloud spend.

Should I trust provider dashboards or my own internal telemetry?

Use provider data as the billing source of truth and internal telemetry as the explanation layer. If they disagree, reconcile to provider cost first, then debug your metadata pipeline.

How often should I refresh LLM spend data?

Daily is the minimum useful cadence for most production teams. High-volume or high-risk systems may justify more frequent refreshes, but daily is the level where you can still catch problems before month-end.

What is the first alert I should set?

Set an alert for abnormal daily spend or abnormal provider/model movement, not just a month-end budget breach. Daily pace alerts give you time to respond.

Is one provider dashboard enough if I mostly use OpenAI?

Only if OpenAI is truly the only source of model spend and you do not need feature, customer, or forecast views. As soon as Anthropic, Cursor, Bedrock, Vertex, or Azure OpenAI enter the picture, you need a cross-provider view.

How do I monitor API spend across multiple providers?

Monitor LLM costs across providers by pulling daily cost or usage data from each provider (OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI), normalizing dimensions (provider, model, project, team), and reviewing totals and trends in one dashboard. API billing analytics tools that support multiple providers let you compare spend and detect anomalies before the invoice.

How do I track LLM usage in production?

To track LLM usage in production, capture provider, model, feature, and customer metadata on each request, pull daily usage or cost data from provider APIs, and join it with your metadata in a reporting layer. Use projects (OpenAI), workspaces (Anthropic), or cloud tags (Bedrock, Vertex) for provider-level grouping. Production tracking works best when you establish the attribution model before scale.

Practical takeaway

Tracking LLM API spend well means building one shared view across providers, models, and categories. The billing source of truth should come from the providers. The explanation layer should come from your own product metadata. Once you combine those, alerts and forecasts become much more useful than end-of-month invoice review.

How StackSpend helps

StackSpend gives you the cross-provider layer that normalizes spend from OpenAI, Anthropic, AWS Bedrock, GCP Vertex, Azure OpenAI, Cursor, and others into a single daily view organized by provider, model, service, and category. That is the foundation the tracking pattern in this guide requires. Provider setup guides: OpenAI, Anthropic, AWS, GCP, Azure.

What to do next

References

Continue with workflow design

Share this post

Send it to someone managing cloud or AI spend.

LinkedInX

Know where your cloud and AI spend stands — every day.

Connect providers in minutes. Get 90 days of visibility and start receiving daily cost updates before the invoice lands.

14-day free trial. No credit card required. Plans from $19/month.