Model Intelligence

Frontier AI Model Pareto Analysis

Intelligence Index vs. Token Efficiency — upper-right is better. The dashed frontier connects models where no alternative is both smarter AND cheaper.

Data: Artificial Analysis Intelligence Index v4.0 · Pricing: First-party APIs, blended 3:1 in:out · March 2026

Token Efficiency (M tokens per $1) →
↑ Intelligence Index
Gemini 2.5 Flash
Grok 4.1 Fast
DeepSeek V3.2
Claude Sonnet 4.6
Claude Opus 4.6
MiniMax M2.7
Qwen3.5 397B
Kimi K2.5
GLM-5
Gemini 3.1 Pro
GPT-5.4
better →↑
Reasoning
Standard
Proprietary
Open-weight

Pareto Frontier Models

No other model is both smarter AND more cost-efficient

8 on frontier
ModelProviderIIM Tok/$1Blended $/MIn $/MOut $/MType
Gemini 2.5 FlashGoogle353.9M$0.26$0.15$0.60
Reasoning
Grok 4.1 FastxAI383.6M$0.28$0.20$0.50
Standard
DeepSeek V3.2 (R)DeepSeek403.1M$0.32$0.28$0.42
Reasoning Open
Claude Sonnet 4.6Anthropic521M$6.00$3.00$15.00
Reasoning
Claude Opus 4.6Anthropic531M$10.00$5.00$25.00
Reasoning
MiniMax M2.7MiniMax461.9M$0.53$0.30$1.20
Reasoning Open
Qwen3.5 397BAlibaba451.1M$0.88$0.50$2.00
Reasoning Open
Kimi K2.5Moonshot AI471.1M$0.90$0.45$2.25
Reasoning Open
GLM-5Zhipu AI500.8M$1.24$0.80$2.56
Reasoning Open
Gemini 3.1 ProGoogle550.2M$4.67$2.00$12.00
Reasoning
GPT-5.4 (xhigh)OpenAI570.2M$4.38$2.50$10.00
Reasoning

What the frontier tells us

Intelligence is getting cheaper, fast.

The gap between the smartest and most efficient models is shrinking every quarter. Models that cost $4/M tokens six months ago now have competitors at $0.30/M with 70% of the capability.

Open-weight is winning on efficiency.

Open-weight models consistently appear on the Pareto frontier because competitive pressure and community optimization drive costs down faster than proprietary development cycles.

The best strategy is model-agnostic.

No single model wins on every dimension. The optimal approach is routing each task to the best model for that specific job — which is exactly what a Millie agent does automatically.

Key insight

The frontier is splitting in two. At the top, Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.4 compete for peak intelligence — with Gemini 3.1 Pro leading most benchmarks at half the price of Opus. Below them, open-weight models like MiniMax M2.7, Kimi K2.5, and DeepSeek V3.2 deliver 80-90% of that intelligence at 5-20× lower cost. The smartest strategy isn’t picking one — it’s using both tiers.

Why this matters for Millie

Your Millie agent uses the right model for each task — the most cost-efficient model that meets the intelligence threshold. Routine operations run on frontier-efficient models. Complex reasoning tasks use the most capable models available. We track the Pareto frontier continuously, so when a better model ships, your agent upgrades automatically. You get smarter AI at lower cost without lifting a finger.