Model Intelligence
Frontier AI Model Pareto Analysis
Intelligence Index vs. Token Efficiency — upper-right is better. The dashed frontier connects models where no alternative is both smarter AND cheaper.
Data: Artificial Analysis Intelligence Index v4.0 · Pricing: First-party APIs, blended 3:1 in:out · March 2026
Pareto Frontier Models
No other model is both smarter AND more cost-efficient
| Model | Provider | II | M Tok/$1 | Blended $/M | In $/M | Out $/M | Type |
|---|---|---|---|---|---|---|---|
| Gemini 2.5 Flash | 35 | 3.9M | $0.26 | $0.15 | $0.60 | ⚡ Reasoning | |
| Grok 4.1 Fast | xAI | 38 | 3.6M | $0.28 | $0.20 | $0.50 | Standard |
| DeepSeek V3.2 (R) | DeepSeek | 40 | 3.1M | $0.32 | $0.28 | $0.42 | ⚡ Reasoning◆ Open |
| Claude Sonnet 4.6 | Anthropic | 52 | 1M | $6.00 | $3.00 | $15.00 | ⚡ Reasoning |
| Claude Opus 4.6 | Anthropic | 53 | 1M | $10.00 | $5.00 | $25.00 | ⚡ Reasoning |
| MiniMax M2.7 | MiniMax | 46 | 1.9M | $0.53 | $0.30 | $1.20 | ⚡ Reasoning◆ Open |
| Qwen3.5 397B | Alibaba | 45 | 1.1M | $0.88 | $0.50 | $2.00 | ⚡ Reasoning◆ Open |
| Kimi K2.5 | Moonshot AI | 47 | 1.1M | $0.90 | $0.45 | $2.25 | ⚡ Reasoning◆ Open |
| GLM-5 | Zhipu AI | 50 | 0.8M | $1.24 | $0.80 | $2.56 | ⚡ Reasoning◆ Open |
| Gemini 3.1 Pro | 55 | 0.2M | $4.67 | $2.00 | $12.00 | ⚡ Reasoning | |
| GPT-5.4 (xhigh) | OpenAI | 57 | 0.2M | $4.38 | $2.50 | $10.00 | ⚡ Reasoning |
What the frontier tells us
Intelligence is getting cheaper, fast.
The gap between the smartest and most efficient models is shrinking every quarter. Models that cost $4/M tokens six months ago now have competitors at $0.30/M with 70% of the capability.
Open-weight is winning on efficiency.
Open-weight models consistently appear on the Pareto frontier because competitive pressure and community optimization drive costs down faster than proprietary development cycles.
The best strategy is model-agnostic.
No single model wins on every dimension. The optimal approach is routing each task to the best model for that specific job — which is exactly what a Millie agent does automatically.
Key insight
The frontier is splitting in two. At the top, Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.4 compete for peak intelligence — with Gemini 3.1 Pro leading most benchmarks at half the price of Opus. Below them, open-weight models like MiniMax M2.7, Kimi K2.5, and DeepSeek V3.2 deliver 80-90% of that intelligence at 5-20× lower cost. The smartest strategy isn’t picking one — it’s using both tiers.
Why this matters for Millie
Your Millie agent uses the right model for each task — the most cost-efficient model that meets the intelligence threshold. Routine operations run on frontier-efficient models. Complex reasoning tasks use the most capable models available. We track the Pareto frontier continuously, so when a better model ships, your agent upgrades automatically. You get smarter AI at lower cost without lifting a finger.