The Budget Tier Has Never Been This Good
Eighteen months ago, "cheap" meant "barely usable." In June 2026, several models priced under $0.50 per million input tokens hold their own on real work. If you are paying flagship rates for classification, extraction, routing, or routine chat, you are likely overpaying by 10-100x.
The Cheapest Capable Models, Ranked by Blended Cost
| Model | Input $/1M | Output $/1M | Best For |
|---|---|---|---|
| Mistral Nemo | $0.02 | $0.03 | Simple classification, routing |
| GPT-5 Nano | $0.05 | $0.40 | High-volume structured tasks |
| Mistral Small | $0.06 | $0.18 | EU-hosted, GDPR-sensitive work |
| Llama 4 Scout | $0.10 | $0.30 | Open weights, long documents |
| DeepSeek V4 Flash | $0.14 | $0.28 | Best quality per dollar overall |
| Grok 4.1 Fast | $0.20 | $0.50 | 2M context, agentic tool use |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | Google ecosystem, multimodal |
Three Picks We Would Actually Build On
DeepSeek V4 Flash ($0.14/$0.28) is the value king. It benchmarks near models 10x its price, and DeepSeek's cache-hit pricing drops input as low as $0.0028 per million on repeated context — effectively free input for agent loops.
Grok 4.1 Fast ($0.20/$0.50) earns its place with a 2M-token context window, the largest of any model we track at any price. For whole-codebase analysis or massive document piles, nothing else at this price comes close.
GPT-5 Nano ($0.05/$0.40) remains the safest pick for strict structured output at extreme volume, with OpenAI's tooling and a 400K context.
The Routing Strategy That Cuts Bills 60-90%
The biggest savings do not come from picking one cheap model — they come from routing. Send the easy 80% of requests to a budget model and escalate only ambiguous or high-stakes cases to a flagship. A support pipeline doing 100M tokens/month on Claude Opus 4.8 costs roughly $3,000; the same pipeline routing 85% of traffic to DeepSeek V4 Flash lands nearer $500.
Watch the Output-Token Trap
Cheap reasoning models can quietly burn your budget: they "think" in billed output tokens. DeepSeek-R1 at $0.55/$2.19 looks cheap until a single math problem emits 8,000 thinking tokens. For verbose-output workloads, weight your comparison heavily toward the output price column.
Prices change weekly — check the live pricing table, then model your exact workload in the calculator. Pair this guide with How to Cut Your AI Bill in Half for caching and batching tactics.