How to Forecast AI Costs in Production

Use this when you need a forecast you can defend in front of your team, your board, or yourself.

The fast answer: forecast AI costs with a base case, a growth case, and a stress case. Use cost per request, requests per user, and expected usage growth as the inputs, then compare your daily pace against that forecast as the month progresses.

What you will get in 10 minutes

A simple AI cost forecasting model
A way to handle uncertainty without pretending precision
A daily review loop that tells you when the forecast is breaking
A guide for what to do when the forecast breaks mid-month

Why AI forecasts break — a story

Traditional infrastructure forecasts break when usage grows unexpectedly. AI forecasts break differently and more easily. The reason is that AI cost is a product of two variables, not one: volume (how many requests) and unit cost (how much each request costs). Both can change independently, and both can change quickly.

Here is what a typical breaking forecast looks like:

A team is building an AI-powered search feature. They budget $6,000 for the month based on their current request volume at $0.008 per request. At the end of week two, spend is already at $7,200 — tracking toward $14,400, more than double the budget.

What happened? Two things changed simultaneously. First, the product team shipped a new suggested-queries feature that increased average session depth from 3 queries to 5.8 queries. Request volume went up 93 percent. Second, an engineer updated the system prompt to include more context about the search domain — adding 800 tokens to every request. Cost per request increased from $0.008 to $0.011.

Either change alone would have caused an overrun. Together, they produced a 2.4x cost increase in two weeks.

The lesson is not that forecasting is impossible — it is that a good forecast needs to model both inputs explicitly and track them separately. A forecast that only monitors total spend cannot tell you whether the problem is volume growth, cost per request growth, or both.

The three inputs that matter most

At minimum, model these separately:

Cost per request — the average cost of one complete request in the workflow you are forecasting
Requests per day — the current daily volume
Expected daily growth rate — the percentage by which request volume is expected to grow

Simple formula:

Monthly inference cost =
cost per request × requests per day × 30

If you have user-level data:

Monthly inference cost =
cost per request × requests per user × active users

If your product has several AI workflows, create one line per workflow instead of one blended average. A blended average hides the cases where one workflow is cheap and growing slowly while another is expensive and growing fast.

How to choose cost per request as an input

This is the step that most forecasts get wrong. Teams either guess at cost per request or use an out-of-date number from when the workflow was first built.

The right way to find your cost per request: Pull total inference spend for a specific workflow over the last 7 days and divide by the request count for the same period. Do this per workflow, not for all requests combined.

Cost per request =
total inference spend for workflow X over 7 days
÷ total requests for workflow X over 7 days

If you cannot break out costs by workflow yet, use total inference spend divided by total requests as an approximation. It is less accurate but better than guessing.

Why this number changes: Cost per request is not static. It changes when:

System prompts are updated (more or fewer tokens)
Retrieved context grows or shrinks
The model tier changes (intentionally or via routing)
Response length expectations shift
New workflows start sharing the same API key and blend into the average

For stable workflows, cost per request is relatively consistent week to week. For workflows that are actively being developed, it can move by 20 to 40 percent between sprints. Track it at least weekly as part of your monitoring routine.

Account for variation by workflow: If you have multiple workflows, do not use one average cost per request across all of them. A document summarization workflow might cost $0.015 per request while a simple FAQ routing workflow costs $0.0008. Blending them hides the fact that the expensive one is the one growing fastest.

Build the base case first

Your base case should reflect what happens if the current product behavior continues without a major surprise.

Example:

Metric	Value
Active users	5,000
Requests per user per day	6
Cost per request	$0.005

Base-case monthly inference forecast:

5,000 × 6 × 30 × $0.005 = $4,500

Then add the infrastructure around it:

compute for serving and workers
storage for logs and retrieval
networking for transfer and API traffic

The base case is not a prediction that things will stay the same — it is the anchor against which you will measure deviations. If something changes and spend diverges from the base case, the base case is what tells you how much.

Add a growth case

The growth case answers: what if usage expands the way the team hopes it will?

Keep the assumption explicit and simple. If you expect 25 percent more requests next month because a new feature is shipping, model that directly. Do not hide the assumption in the math.

Example:

Base requests per month: 900,000
Growth assumption: +25%
Growth requests per month: 1,125,000
At $0.005 per request = $5,625

The growth case is useful for planning conversations. When product or leadership asks "what does our AI cost look like if the feature launch goes well?", the growth case is the answer. It also establishes the ceiling of what you need to budget if growth materializes.

Add a stress case

The stress case is the most useful part of the forecast, and most teams skip it.

The purpose of a stress case is not to scare anyone — it is to have a prepared answer for the scenario where two things go wrong at the same time, before that scenario happens. A team that has modeled a stress case can respond to a bad month with "we anticipated this scenario, and here is what we planned to do." A team without a stress case is always reacting.

What makes a realistic stress case for AI costs:

Stress cases are most useful when they combine volume growth with cost-per-request growth, because both can happen simultaneously (as in the story above). Good stress case inputs:

Requests up 40 percent (more than the growth case assumes)
Cost per request up 25 to 30 percent (a prompt change, model routing shift, or retrieved context growth)

Example:

Base requests: 900,000
Stress volume: +40% = 1,260,000 requests
Stress cost per request: $0.005 × 1.30 = $0.0065

Stress-case monthly cost:
1,260,000 × $0.0065 = $8,190

Compare that to the base case of $4,500. The stress case is 82 percent higher.

How to present a stress case to leadership without alarming them:

Frame the stress case as risk management, not a prediction. The language that works: "Our base case is $4,500. Our growth case, if the feature launch goes well, is $5,625. Our stress case — if usage grows faster than planned and we also have a prompt or model change — is $8,190. We have two levers to pull if we approach that ceiling: [specific optimization action 1] and [specific optimization action 2]."

Having the levers named in advance is what makes the stress case useful rather than alarming. It shows you have thought through the scenario rather than just discovered it.

Keep track of cost per feature, not just cost per org

If all AI usage is blended together, your forecast is hard to improve. A blended forecast cannot tell you which workflow is driving the variance — only that something is.

Break out major workflows such as:

chat assistant
background summarization
retrieval and embedding jobs
support automation
coding or agent workflows

This gives you:

cost per feature for comparison to product value
better prioritization when the forecast drifts — you can act on the specific workflow rather than the aggregate

Even two or three workflow categories is much more useful than one total.

Compare daily pace against forecast

The forecast is not a once-a-month exercise. The real value comes from checking it against actual pace daily — specifically:

month-to-date spend
implied month-end if current pace continues
variance vs forecast at current date

If you are 15 days into a 30-day month, you should be roughly at 50 percent of your forecast. Being at 65 percent on day 15 means you are running 30 percent ahead of pace. That is actionable information with 15 days left — enough time to investigate and potentially correct.

This is why daily tracking matters. It gives you time to act rather than only being informed.

When the forecast breaks mid-month

This happens. The question is what to do about it.

A practical response sequence for when you are significantly over pace:

Step 1 — Quantify the gap. How far over pace are you, and what is the projected month-end at current rate? Turn "we are overspending" into a specific number: "We are on pace for $7,200 against a $5,000 forecast."

Step 2 — Isolate the cause. Is the overage in volume, cost per request, or both? Check each workflow's cost per request for the last 7 days against the baseline. Check request volume for the same period. Usually one or the other is the primary driver.

Step 3 — Identify the specific change. What happened around the time the overrun started? A deployment, a feature launch, a prompt change, a new customer segment going live? The engineering owner should be able to identify the change within 30 minutes if monitoring is in place.

Step 4 — Decide on a response. The options are: (a) correct the change if it was unintentional waste, (b) optimize the workflow if the change was intentional but the cost is too high, (c) accept the overrun if the cost is justified by product value and update the forecast, or (d) escalate if the overrun is large enough to require leadership awareness.

Step 5 — Update the forecast. Once you know what changed and whether it is being corrected, revise the month-end estimate. If the cause is corrected, revise it down. If the cause is accepted, revise it to reflect the new level. A stale forecast that no longer reflects reality is worse than no forecast.

A practical review model

Use this simple structure:

Scenario	What it assumes	What you do with it
Base	Current usage continues	Run the normal plan
Growth	Product usage grows as expected	Check whether budget still holds
Stress	Usage and cost per request both increase	Prepare corrective actions in advance

You do not need a complex forecasting platform to start. You need a model that the team will actually review. A shared spreadsheet with these three scenarios, updated weekly, is meaningfully better than no forecast.

Forecasting is better with cross-provider analysis

If you use more than one AI or cloud provider, forecasting from a single vendor dashboard is incomplete.

An AI workflow can move spend across:

OpenAI or Anthropic for inference
AWS or GCP for worker compute
vector database storage
networking or egress

If inference cost is flat but your compute costs double because a new background summarization job is running constantly, a forecast that only covers inference will miss that completely.

Cross-provider analysis matters because it tells you whether the forecast problem is inference, infrastructure, or adjacent services growing around the AI workflow.

How StackSpend helps

StackSpend makes this workflow easier by providing:

cross-provider daily spend visibility
category-based analysis across inference, compute, storage, and networking
daily forecast vs budget tracking
faster review of cost changes by service and provider

That turns forecasting into an operating loop, not just a spreadsheet exercise.

Final take

A useful AI cost forecast does not try to predict everything. It gives the team:

a base case anchored in current behavior
a growth case that reflects product plans
a stress case that models two bad things at once
a daily pace check that catches deviations while there is still time to act

Build the three-case model first. Review daily pace weekly. Update assumptions when something changes. That is the forecasting practice of teams that never get surprised at month-end.

How to Forecast AI Costs in Production

What you will get in 10 minutes

Why AI forecasts break — a story

The three inputs that matter most

How to choose cost per request as an input

Build the base case first

Add a growth case

Add a stress case

Keep track of cost per feature, not just cost per org

Compare daily pace against forecast

When the forecast breaks mid-month

A practical review model

Forecasting is better with cross-provider analysis

How StackSpend helps

Final take

What to do next

Know where your cloud and AI spend stands — every day.