Intelligence Per Watt

Vision Statement

From 1946 to 2009, computing efficiency—performance per watt—doubled every 1.5 years. This trend, documented by Koomey and colleagues, transformed where computing could happen. Workloads migrated from mainframe rooms to desktops, then laptops, then pockets. The transition from centralized time-sharing to personal computing didn't occur because PCs surpassed mainframes in raw performance. It occurred when efficiency gains made computing capable enough within the power constraints of personal devices.

We're at the same inflection point for artificial intelligence.

Today, most AI queries flow through centralized datacenters while demand grows at steep rates: 1300× increases in token processing, year-over-year scaling that strains power grids. Yet telemetry shows that 77% of requests are practical tasks—writing emails, summarizing documents, seeking information—that don't require frontier-scale models.

We propose INTELLIGENCE PER WATT (IPW)—task accuracy per unit of power—as a unified metric for understanding this transition. Just as performance-per-watt guided the mainframe-to-PC shift, intelligence-per-watt clarifies the path from centralized AI to distributed intelligence. IPW provides a common framework for studying three questions shaping AI's future:

Workload Redistribution: From Cloud to Edge

Local language models (≤20B parameters) now accurately answer 88.7% of single-turn queries, and consumer accelerators run them at interactive latencies. IPW improved 5.3× from 2023–2025—3.1× from model advances, 1.7× from hardware gains. By measuring intelligence efficiency across the model-hardware landscape, we can identify which queries belong on which devices. Hybrid systems that route queries appropriately cut energy, compute, and cost by 60–80% while preserving quality. IPW tracks this redistribution as it unfolds.

Economic Value: Measuring AI's Real-World Impact

Not all intelligence is equal. A model that handles graduate-level physics but fails at email drafting delivers different economic value than one with the opposite profile. By weighting IPW against GDP-relevant task distributions, we can quantify how much economic value AI systems generate per watt consumed. This lens reveals where current systems create value, where gaps remain, and how efficiency gains translate into productivity across economic sectors.

National Competitiveness: The Global AI Race

The nation that most efficiently converts energy into deployed intelligence gains advantage. We introduce Gross Domestic Intelligence (GDI)—the product of intelligence-per-watt and accessible power—as a framework for AI competition. China and the United States face inverse constraints: China is compute-bound by export controls on advanced chips; America is energy-bound by grid limitations and datacenter bottlenecks. IPW reveals an asymmetric American asset: hundreds of millions of local accelerators already deployed in homes and offices. This installed base could boost effective AI capacity 2–4× without new datacenter construction.

The path forward: Intelligence per watt should be a north star metric for model architecture, hardware design, and national strategy. We're building the measurement infrastructure, benchmarks, and systems to make this concrete—and releasing our tools for others to use.

The IPW Research Agenda

We're pursuing a coordinated research program to understand and maximize intelligence efficiency across the full stack.

Category	Initiative	Objective
Measurement & Benchmarking	GDP-Weighted Evaluation	Quantifying economic value generated per watt on real-world, GDP-relevant tasks.
Measurement & Benchmarking	IPW Attribution	Decomposing efficiency gains into algorithmic versus hardware contributions through continuous benchmarking.
National Competitiveness	Gross Domestic Intelligence	Identifying high-impact interventions across inference systems, power grids, and model architectures.
Models & Systems	Post-training for IPW	Training local models to use frontier models as tools for verification and sophisticated assistance.
Models & Systems	Hybrid Inference Engine	Building systems that automatically route work between local and cloud compute to maximize IPW subject to latency, privacy, and cost constraints.

Our Work

📄 Publication

Intelligence Per Watt: Measuring Intelligence Efficiency of Local AI

Jon Saad-Falcon*, Avanika Narayan*, et al.

Introduces "intelligence per watt" (IPW) as a metric for measuring AI efficiency, finding that local LMs can answer 88.7% of single-turn reasoning & chat queries and that hybrid local-cloud routing cuts energy use by 64% and costs by 59% compared to cloud-only inference.

Paper (arXiv) → Blog Post →

📄 Publication

Maximizing American Gross Domestic Intelligence with Hybrid Inference

Jared Dunnmon*, Avanika Narayan*, Jon Saad-Falcon*, Chris Ré

Proposes "Gross Domestic Intelligence" (GDI) as a framework for national AI competitiveness, arguing that the U.S. can boost effective inference capacity 2–4× by activating the 70–80M AI-capable devices already deployed in American homes and offices alongside cloud infrastructure.

Blog Post →

📄 Publication

Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models

Avanika Narayan*, Dan Biderman*, Sabri Eyuboglu*, et al.

Introduces protocols for local-cloud LM collaboration on long-document reasoning tasks, where MinionS reduces cloud costs by 5.7× while maintaining 97.9% of frontier model accuracy by decomposing tasks into parallelizable subtasks executed locally.

Paper (arXiv) → Blog Post →

🔧 Code & Tools

IPW Profiling Harness

Open-source benchmarking suite that profiles LLM inference across NVIDIA, AMD, and Apple Silicon, measuring energy consumption, power draw, latency, and throughput to compute intelligence-per-watt metrics for any model-accelerator configuration.

GitHub Repository →

Vision Statement

Workload Redistribution: From Cloud to Edge

Economic Value: Measuring AI's Real-World Impact

National Competitiveness: The Global AI Race

The IPW Research Agenda

Our Work

Related Works