From the team

What we learned
building Neureil

Real write-ups on feeding web data to LLMs, cutting token costs, and the iterations that led to where we are today.

Our Journey
v0.1 — alpha
Late March 2026
We started with raw fetch
The first version was just a wrapper around Node's fetch. Pass a URL, get back raw HTML. Simple. We tested it with GPT-4o on a few news articles and the token counts were brutal. One BBC article came back at over 11,000 tokens. The model spent 80% of its budget reading nav bars, cookie notices, and comment sections before it hit the actual story.
v0.2 — readability layer
Early April 2026
Added Mozilla Readability, saw the first real drop
We dropped Mozilla's Readability library in and ran the same 50 URLs. Average token count went from around 8,400 down to 1,100 per page. That was the first moment we thought this could be a real product. But it still had issues — tables came out malformed, code blocks lost their indentation, and a lot of metadata was missing.
avg 8,400 tokens → 1,100 tokens
v0.3 — structured output
Mid April 2026
Switched from plain text to structured JSON
Plain text output wasn't enough for agents that needed to understand page structure. We redesigned the output to be a structured JSON object with separate fields for title, author, published date, main content, and code blocks. This made it usable as context without any post-processing on the caller's side. We also fixed the table and code block handling.
v0.4 — caching + benchmarks
Late April 2026
Built caching, ran the proper benchmark
We added a 24-hour response cache so repeated requests to the same URL are instant. Then we did a proper benchmark across 200 URLs covering news, documentation sites, e-commerce, and developer blogs. The result was an average 92% token reduction versus raw HTML. We published those numbers on the landing page and they have held up since. P95 latency for uncached requests was 847ms.
92% token reduction · 200 URL benchmark
v0.5 — JS rendering
Early May 2026
Added JavaScript rendering for SPAs
A lot of modern sites, especially documentation and product pages, render their content client-side with React or Vue. Plain fetch returned near-empty HTML for those. We added a headless rendering path via Puppeteer that kicks in when the initial fetch returns insufficient content. The routing logic picks the fastest path automatically so you never have to think about it.
v1.0 — public launch
May 13, 2026
Launched publicly with API keys and billing
After six weeks of internal testing and a few friends using it for their agents, we launched publicly with a full API key dashboard, usage tracking, and two pricing tiers. The product you are reading about today. Still lots to build.
Latest posts
// before neureil — raw html as context
const html = await fetch(url).then(r => r.text())
// 9,200 tokens. mostly nav & ads.
 
// after — clean extraction
const data = await fetch('https://api.neureil.com/extract', {
method: 'POST',
body: JSON.stringify({ url })
}).then(r => r.json())
// 740 tokens. just the content.
How to Feed Web Data to an LLM Without Wasting 90% of Your Context Window
Raw HTML is the worst possible input for a language model. Here is what actually happens when you pass a webpage as context, why it breaks your agents, and how to fix it.
Read post
// token usage audit — gpt-4o, 1000 requests
const rawHtml = { avg: 8420, total: 8_420_000 }
const cleaned = { avg: 674, total: 674_000 }
 
const costPer1M = 2.50 // gpt-4o input
const savings = (8_420_000 - 674_000) / 1_000_000 * 2.50
console.log(`Saved $${savings.toFixed(2)} per 1k calls`)
// Saved $19.37 per 1k calls
How We Cut Token Costs by 92% for AI Agents That Browse the Web
LLM bills scale with tokens. If your agent reads web pages, most of those tokens are garbage. We ran the numbers, and the waste is more expensive than you think.
Read post
// traditional pipeline
async function scrape(url) {
const html = await fetch(url)
const $ = cheerio.load(html) // hand-roll selectors
return $('.article-body p').text() // breaks every deploy
}
 
// ai-ready pipeline
const result = await neureil.extract(url)
// always works, zero selectors
Web Scraping for AI Pipelines Is Completely Different from Traditional Scraping
CSS selectors and XPath were built for extracting specific fields from known page layouts. AI pipelines need something else entirely. Here is what changed.
Read post
// neureil response shape
{
"title": "Understanding Transformers",
"author": "Andrej Karpathy",
"published": "2025-11-04",
"content": "The attention mechanism...",
"tokens": 812,
"cached": false
}
Why Structured Data Extraction APIs Are Replacing Custom Scrapers in AI Stacks
Developers used to write custom scrapers for every site. Now that AI is the consumer, the requirements have completely changed. Structure beats selectors.
Read post