Cheapest LLM API by task tier
Models ranked by blended $/M-token cost (output weighted 3:1, the typical agent mix), grouped by minimum context window. Live status shown per row.
Bottom line: the cheapest tracked model overall is Command R7B (12-2024) (Cohere) at $0.04/1M in, $0.15/1M out.
Cheapest with ≥128K context
| # | Model | Provider | Input /1M | Output /1M | Context | Blended |
|---|---|---|---|---|---|---|
| 1 | Command R7B (12-2024) | Cohere | $0.04 | $0.15 | 128K | $0.12 |
| 2 | GPT-OSS 20B | Groq | $0.08 | $0.3 | 131K | $0.24 |
| 3 | DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M | $0.25 |
| 4 | Llama 3.3 70B (via OpenRouter) | OpenRouter | $0.1 | $0.32 | 131K | $0.27 |
| 5 | Llama 4 Scout (17Bx16E) | Groq | $0.11 | $0.34 | 131K | $0.28 |
| 6 | GPT-4.1 nano | OpenAI | $0.1 | $0.4 | 1M | $0.33 |
| 7 | Gemini 2.5 Flash-Lite | Google Gemini | $0.1 | $0.4 | 1.048576M | $0.33 |
| 8 | GPT-4o mini (legacy) | OpenAI | $0.15 | $0.6 | 128K | $0.49 |
Absolute cheapest (any context)
| # | Model | Provider | Input /1M | Output /1M | Context | Blended |
|---|---|---|---|---|---|---|
| 1 | Command R7B (12-2024) | Cohere | $0.04 | $0.15 | 128K | $0.12 |
| 2 | GPT-OSS 20B | Groq | $0.08 | $0.3 | 131K | $0.24 |
| 3 | DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M | $0.25 |
| 4 | Llama 3.3 70B (via OpenRouter) | OpenRouter | $0.1 | $0.32 | 131K | $0.27 |
| 5 | Llama 4 Scout (17Bx16E) | Groq | $0.11 | $0.34 | 131K | $0.28 |
| 6 | GPT-4.1 nano | OpenAI | $0.1 | $0.4 | 1M | $0.33 |
| 7 | Gemini 2.5 Flash-Lite | Google Gemini | $0.1 | $0.4 | 1.048576M | $0.33 |
| 8 | GPT-4o mini (legacy) | OpenAI | $0.15 | $0.6 | 128K | $0.49 |
Cheapest with ≥1M context
| # | Model | Provider | Input /1M | Output /1M | Context | Blended |
|---|---|---|---|---|---|---|
| 1 | DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M | $0.25 |
| 2 | GPT-4.1 nano | OpenAI | $0.1 | $0.4 | 1M | $0.33 |
| 3 | Gemini 2.5 Flash-Lite | Google Gemini | $0.1 | $0.4 | 1.048576M | $0.33 |
| 4 | DeepSeek V4 Pro | DeepSeek | $0.44 | $0.87 | 1M | $0.76 |
| 5 | DeepSeek V4 Pro (via OpenRouter) | OpenRouter | $0.44 | $0.87 | 1.048576M | $0.76 |
| 6 | Gemini 3.1 Flash-Lite | Google Gemini | $0.25 | $1.5 | 1.048576M | $1.19 |
Automate this: agents call cheapest_model(min_context=…, operational_only=true) over MCP/x402 to route to the cheapest live model per request. API docs →