How We Cut Token Costs by 92% for AI Agents That Browse the Web

LLM bills are almost always bigger than expected. When you look closely at where the tokens go, web browsing is usually the culprit. If your agent reads pages from the internet, you are almost certainly paying for an enormous amount of HTML scaffolding that your model does not need and cannot use.

We ran a proper benchmark on this in April 2026 and the numbers changed how we think about AI infrastructure cost entirely.

The actual cost breakdown

We gathered 200 URLs from five categories: news articles, developer documentation, e-commerce product pages, company blog posts, and Wikipedia articles. We then measured the token count of the raw HTML versus the clean extracted content for each one.

Raw HTML avg

8,420

tokens per page

Clean content avg

674

tokens per page

Reduction

92%

average across 200 URLs

At GPT-4o input pricing of $2.50 per million tokens, 1,000 daily raw HTML page reads costs you about $21 per day. The same 1,000 reads on clean extracted content costs around $1.69 per day. That is $570 saved every month from one change.

The numbers broken down by content type

Page Type	Raw HTML	Clean Extract	Reduction
News articles	10,800	480	95%
E-commerce product pages	12,200	620	95%
Developer documentation	6,100	1,100	82%
Blog posts	7,400	740	90%
Wikipedia articles	5,900	980	83%

News and e-commerce pages saw the biggest reductions because they have the most peripheral content. Ads, related stories, sponsored content, product recommendation carousels, and dynamic widgets all add up. Documentation pages were the cleanest to begin with since they are already structured for reading.

This is not just about money

The cost savings are the headline number but there is a second benefit that is arguably more valuable for agents, which is what we call signal density.

When you reduce the noise in your prompt, the model spends more of its reasoning capacity on content that actually matters. We ran qualitative tests on summarisation quality comparing raw HTML context against clean extracted context. The extracted version consistently produced more accurate summaries with fewer hallucinations.

This makes sense mechanically. The attention mechanism in a transformer has finite bandwidth. When a large portion of your input is irrelevant, you are effectively diluting the signal. Clean input means better output.

How the math compounds with usage

Single-request savings look modest. Multiply them across an agent that runs at scale and the picture changes quickly.

cost projection — gpt-4o input pricing ($2.50 per 1M tokens)const dailyRequests = 5_000
 
const rawHtml = {
  avgTokens: 8_420,
  dailyCost: (8_420 * 5_000) / 1_000_000 * 2.50 // $105.25
}
 
const cleaned = {
  avgTokens: 674,
  dailyCost: (674 * 5_000) / 1_000_000 * 2.50 // $8.43
}
 
const monthlySavings = (105.25 - 8.43) * 30 // $2,904

The extraction layer pays for itself

When we ran these numbers internally, the case for building a proper extraction layer was obvious. Even if an extraction API costs you a few dollars per thousand requests, you come out far ahead when each clean request saves you multiple times that in LLM input costs.

Neureil costs $5 per month for 1,000 extractions on the Starter plan. If each of those extractions saves you an average of $0.019 in GPT-4o input costs versus raw HTML (based on the 7,746 token reduction at $2.50 per million), those 1,000 extractions save you $19 in LLM costs. You pay $5 to save $19.

What to measure in your own setup

If you want to run this analysis on your own traffic, the process is simple. Pull a sample of 50 URLs your agent has recently visited. Count the tokens in the raw HTML using a tokeniser like tiktoken. Count the tokens in the extracted plain content. The ratio will tell you your exact waste percentage.

Almost every team we have spoken to has been surprised by how high their number is. The waste is invisible unless you explicitly go looking for it.

See your token reduction live

Paste any URL into the demo on our homepage and see exactly how many tokens come back from clean extraction versus what raw HTML would cost you.

Try the demo

Back to all posts