Model Intelligence

Frontier AI Model Pareto Analysis

Intelligence Index vs. Token Efficiency — upper-right is better. The dashed frontier connects models where no alternative is both smarter AND cheaper.

Data: Artificial Analysis Intelligence Index v4.0 · Pricing: First-party APIs, blended 3:1 in:out · March 2026

Token Efficiency (M tokens per $1) →

↑ Intelligence Index

Gemini 2.5 Flash

Grok 4.1 Fast

DeepSeek V3.2

Claude Sonnet 4.6

Claude Opus 4.6

MiniMax M2.7

Qwen3.5 397B

Kimi K2.5

GLM-5

Gemini 3.1 Pro

GPT-5.4

better →↑

Reasoning

Standard

●Proprietary

◆Open-weight

Pareto Frontier Models

No other model is both smarter AND more cost-efficient

8 on frontier

Model	Provider	II	M Tok/$1	Blended $/M	In $/M	Out $/M	Type
Gemini 2.5 Flash	Google	35	3.9M	$0.26	$0.15	$0.60	⚡ Reasoning
Grok 4.1 Fast	xAI	38	3.6M	$0.28	$0.20	$0.50	Standard
DeepSeek V3.2 (R)	DeepSeek	40	3.1M	$0.32	$0.28	$0.42	⚡ Reasoning◆ Open
Claude Sonnet 4.6	Anthropic	52	1M	$6.00	$3.00	$15.00	⚡ Reasoning
Claude Opus 4.6	Anthropic	53	1M	$10.00	$5.00	$25.00	⚡ Reasoning
MiniMax M2.7	MiniMax	46	1.9M	$0.53	$0.30	$1.20	⚡ Reasoning◆ Open
Qwen3.5 397B	Alibaba	45	1.1M	$0.88	$0.50	$2.00	⚡ Reasoning◆ Open
Kimi K2.5	Moonshot AI	47	1.1M	$0.90	$0.45	$2.25	⚡ Reasoning◆ Open
GLM-5	Zhipu AI	50	0.8M	$1.24	$0.80	$2.56	⚡ Reasoning◆ Open
Gemini 3.1 Pro	Google	55	0.2M	$4.67	$2.00	$12.00	⚡ Reasoning
GPT-5.4 (xhigh)	OpenAI	57	0.2M	$4.38	$2.50	$10.00	⚡ Reasoning

What the frontier tells us

Intelligence is getting cheaper, fast.

The gap between the smartest and most efficient models is shrinking every quarter. Models that cost $4/M tokens six months ago now have competitors at $0.30/M with 70% of the capability.

Open-weight is winning on efficiency.

Open-weight models consistently appear on the Pareto frontier because competitive pressure and community optimization drive costs down faster than proprietary development cycles.

The best strategy is model-agnostic.

No single model wins on every dimension. The optimal approach is routing each task to the best model for that specific job — which is exactly what a Millie agent does automatically.

Key insight

The frontier is splitting in two. At the top, Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.4 compete for peak intelligence — with Gemini 3.1 Pro leading most benchmarks at half the price of Opus. Below them, open-weight models like MiniMax M2.7, Kimi K2.5, and DeepSeek V3.2 deliver 80-90% of that intelligence at 5-20× lower cost. The smartest strategy isn’t picking one — it’s using both tiers.

Why this matters for Millie

Your Millie agent uses the right model for each task — the most cost-efficient model that meets the intelligence threshold. Routine operations run on frontier-efficient models. Complex reasoning tasks use the most capable models available. We track the Pareto frontier continuously, so when a better model ships, your agent upgrades automatically. You get smarter AI at lower cost without lifting a finger.

← Back to Research