The AI data center gold rush: what's next beyond power?

Last spring, we shared our perspective on the ‘electricity gauntlet’ of AI infrastructure power demand (Part 1, Part 2). In short, today the bottleneck for AI infrastructure isn’t algorithms or capital – its physical constraints: power, land, and labor. Of these, power is by far the most challenging and critical.

This reality is now mainstream. Over the past few years, the power sector has been rapidly thrust into the spotlight in Silicon Valley and Washington. The electricity gauntlet has become one of the most talked about challenges in technology. By 2035, estimates indicate power demand from AI data centers in the US could grow more than thirtyfold, reaching 123 gigawatts, up from 4 gigawatts in 2024. Companies like GridAstra are seeing huge demand for solutions to rapidly growing interconnection costs and wait times. At GTC 2025, Jensen put it simply: “[AI] revenue is power limited.”

But in five years, the bottleneck may shift. Once electricity supply catches up with demand, the constraint won’t be how fast you can build, but whether the centers you’ve built deliver ROI. That’s when the game flips to optimization, utilization, and flexibility.

This mirrors the lifecycle of any commodity industry. At first, the only goal is to extract – like miners blasting mountains apart just to get ore into the market. Efficiency and refining comes later, once the raw rush stabilizes.

The AI DC market may follow the same pattern as fiber and telecom in the 1990s. Back then, everyone was racing to lay fiber. The only metric that mattered was miles in the ground. Operators overbuilt, fueled by cheap capital and the belief that bandwidth demand was infinite. The market corrected and consolidated, and the winners spent the next two decades squeezing efficiency from that installed base – better routing, denser multiplexing, more sophisticated pricing, and tiered services.

So where are we in this analogy for AI DC build out? We are in Phase 1.

Phase 1 – Build (Now): The power and land grab is on. Every hyperscaler and challenger is racing to pour concrete and light up GPUs. Utilization is uncertain, costs extreme, but “capacity secured” is the KPI. Power density per rack is increasing by orders of magnitude – 5kw a few years ago to 130kw+ today – creating massive urgency for more electricity. Developers are co-locating power and batteries to try and satisfy soaring demand and smooth volatility.
Phase 2 – Consolidation (~2030): As the costs of AI normalize, ROIs on data center assets will become more clear. Not every operator will survive. Some DCs will be stranded if they’re in the wrong geography, lack cheap renewable power, or fail to attract sticky tenants. M&A and retrenchment may follow.
Phase 3 – Refinement (2030 and beyond): Just as fiber players turned raw bandwidth into differentiated services to drive higher ROI, the switch will flip to optimizing AI DC assets : higher GPU utilization, dynamic power trading, multi-tenancy models, and workload-aware pricing. Incremental operational excellence – not sheer build volume – will separate winners from losers.

Given the power conversation of Phase 1 is getting its deserved attention, let’s focus on what’s needed in Phase 2 and 3 of market maturity.

There are two challenges that will dominate these next two phases:

Maximizing GPU utilization to drive up ROI on high-CAPEX AI DCs
Finding scalable ways to automate and operationalize maintenance and servicing

‍

The next frontier: GPU Utilization and Flexibility

From steel to silicon: GPUs now drive data center capex

In modern AI data centers, GPUs are now the bulk of the build cost. In the 2010s, traditional DCs were built with hardware (GPUs/accelerators) as ~30–40% of the all-in build cost, and infra (land, power, cooling, backup systems) was ~60–70%. Today, modern AI clusters see hardware closer to 70%+ of total build cost, depending on design. For example, Microsoft’s 2026 data center in Wisconsin is a $4B project, with the bulk of the cost going into hundreds of thousands of B200 GPUs. This is a major inversion of economics. If GPUs dominate costs, every percentage point of idle time is tens of millions in wasted capex: a high-density rack of B200s ($4M upfront cost) sitting at 40% utilization burns through cash much faster than a marginally inefficient cooling system.

So what does that imply will drive value for DCs in Phase 2 and 3? GPU utilization, or “effective flops delivered per watt per $.” Or, even more precisely, “token output per watt per $ per second.” The utilization of GPUs is a complex problem, and one of the key optimization levers the best operators need to get right.

The challenge of measuring – and then maximizing – token output

Before data center utilization can be optimized, data center performance needs to be accurately measured. One of the greatest uncertainties in the data center equation is how much power is actually turned into compute, or tokens. Misconceptions around “always-on” consumption are pervasive. Regulators and utilities frequently assume DCs run at ~90% load factor – ratio of (average electricity demand) to (peak electricity demand) year‑round – with statements like “operating at a consistent load factor 365 days a year” from Duke Energy.

This is a broad oversimplification of the power consumption, performance and complex load profiles of data centers, and the metrics are misleading. For example, server uptime is an inaccurate proxy for token output because uptime ensures availability, not active compute. The de facto standard metric since 2007, Power Usage Efficiency (PUE), is imprecise too – it measures energy overhead, not actual compute delivered.

Focusing on utilization exposes real inefficiencies, and significant untapped revenues in most data centers. Most colos (companies renting cages and racks) operate between 30-50% utilization and, even for the best-in-class hyperscalers, sustaining utilization rates above 60–70% is difficult. This inefficiency is often underappreciated because transparent, high-quality utilization data is scarce. A 2025 Association for Computing Machinery (ACM) study highlighted this gap: energy consumption is rarely linked to compute capacity, system configuration, or workload type in a consistent way. Few providers disclose actual utilization metrics, leaving most infrastructure planning and operations without a solid empirical foundation.

The nuances of data center’s peaky load profiles should directly impact how compute jobs are orchestrated and utilized. GPU workload orchestration (e.g,. Run:AI, Lambda, CoreWeave) aims to pack jobs more tightly so GPUs don’t sit idle. This is hard: AI power loads are energy-dense, spiky, and “bursty” — both at the job level and across the facility load profile.

At the job level: Training workloads can occupy huge clusters for weeks, with power use dipping and fluctuating during synchronized checkpoints. Inference is even more variable, based on real-time customer behavior and demand. This makes scheduling and provisioning for highly variable job load profiles, runtimes, queue wait times, and memory usage a major challenge.
At the facility level: The entire DC is operated to never exceed peak power demand. Data centers must deal with increasingly heterogeneous demands on GPU memory, compute cores and interconnect bandwidth. They run up against bursts of peak power capacity periodically throughout the day, week or month, followed by sustained troughs of low utilization which drag down overall token output. Put simply, data center utilization is an optimization problem, with power as the constraint.

The combination of insufficient performance metrics, spikey and uncertain load profiles, and a lack of “power-aware” orchestration result in the stubbornly low utilization rates we see today.

What we hope to see: GPU orchestration systems

The opportunities we are focused on are in orchestration software, dynamic scheduling systems, “power-aware” workload balancing, and flexibility programs – a kind of “Kubernetes-moment” for data center infrastructure. Kubernetes took the messy plumbing of the software world and abstracted away specified configs and environments into a containerized, standardized orchestration layer. The same portability and optimization is becoming possible at the compute hardware level, particularly for less latency-sensitive workloads – nightly retraining, model fine-tuning, video indexing, video generation, search ranking, synthetic data generation, summarization – facilitating more portable, programmable and dynamically orchestrated data centers.

Today, the Siemens and Schinders of the world capture most of the value in Data Center Infrastructure Management (DCIM) tools. Given most new data centers projects are designed for huge scale (100MW and up), the approach is reasonably risk averse -- most operators are unlikely to partner with startups in running these capital-intensive AI factories.This is also true for other core infrastructure, like advanced cooling systems.

However, as the demands and precision required for optimal compute performance increase, the incumbent pole position could face competition from new players. Furthermore, with growing compute and data sovereignty laws (EU’s GDPR, China’s PIPL, U.S. state-level rules) and latency-sensitive workloads (autonomous vehicles, fintech, gaming, etc.), we are seeing a push to localized and edge compute that’s closer to the end user. These smaller systems will have different demands, creating room for innovation.

Google has been on the forefront of advanced DCs for the last decade, and continues to push the envelope. It recently announced the first-ever contracts between a hyperscaler and U.S. utility focused on AI DC flexibility. The partnership goes beyond day-to-day operations (reacting in real-time) and is now also shaping utility planning itself. Google has committed to time-based load shifting and curtailing usage during peak periods to support grid reliability – potentially reducing the need for new power generation and transmission. Most notably, the agreement includes AI workloads, not just traditional CPU-based compute.

Emerging opportunities

There is significant whitespace for new tools to be developed that help other DC operators follow in Google’s footsteps:

1. AI-driven optimization

Hyperscalers already do much of this internally, but smaller DC colos and operators don’t have these tools yet. These include:

Rack-level orchestration that dynamically balances GPU compute density, thermal load, and power draw (i.e. Hammerhead AI).
Dynamic scheduling of compute workloads, slowing down less time-sensitive jobs and advancing others depending on demand, power and cost matching.

2. Power & carbon as dynamic marketplaces

Traditional DCIMs track power usage, but a new wave will actively optimize against real-time energy prices, grid constraints, and carbon intensity. Data centers are some of the largest electricity buyers in the world and will increasingly become market participants. Startups like ElectronX (trading) or WattCarbon (carbon signals) could be directly integrated into DCIM layers.

These grid-integrated operating models are moving from theory to practice: some operators now design facilities to shed load or even sell power back during peak grid demand. Emerging AI-native DCs (CoreWeave, Lambda) are exploring AI job scheduling tied to power market prices – running training workloads when electricity is cheapest.

3. Multi-DC orchestration

Today’s DCIMs are largely siloed per site but new startups could build “federated DCIMs” that let operators manage multiple colo sites or hybrid deployments from a single command center. Think spatio-temporal optimization across data centers (i.e. EmeraldAI), which is increasingly valuable as customers become open to “power-flex” SLAs (<99% to accommodate power peaks) for their compute. These SLAs typically allow customers access to either cheaper or previously unavailable GPUs.

4. Customer-facing abstraction

Tools are lacking for customers of colos who have little visibility into how their resources are performing. Solutions could include: Observability + billing, FinOps for DC-scale, granular usage metering across power, GPU cycles, performance etc.
“DCIM as a Service” opportunity: Platforms that expose real-time telemetry, resource usage, and controls to the tenant – not just the facility operator

5. Edge or Modular DCs

Many emerging edge DCs don’t want heavyweight Schneider deployments, so there could be opportunity for lightweight, API-first DCIM built specifically for modular and edge sites (i.e. Rune Energy).

‍

Keeping everything running: maintenance & servicing

Traditional data centers already run with remote monitoring and predictive maintenance, but AI-specific centers face new challenges. GPUs fail more quickly under extreme thermal and power loads, and clusters require rapid swap-outs to avoid costly downtime. Synchronous dependencies make the problem even harder – a single GPU failure can halt an entire distributed training run, compounding the impact of outages.

The next step: Automating the physical build, repair, and upkeep of AI DCs

1. Robotic installation & servicing

Human labor is expensive, inconsistent, and slow compared to the cost of idle GPU time. As GPU clusters become increasingly enormous, we expect to see automated systems (i.e. Molg or Boost) that can both build new data centers, and once operational, hot-swap blades, cables, and even GPUs - all without human technicians onsite.

2. Predictive maintenance AI

Like any complex industrial facility, AI DCs need better sensors and ML models flagging failing components before they break and incur downtime.

3. Modular design

Data center hardware must evolve toward modularity, with racks and pods built for rapid swap-outs rather than slow, manual repairs.

Uncertainty means opportunity

Startups thrive in moments of flux. The AI ecosystem is still in its early buildout, with models, chips, and hardware evolving rapidly - and new layers of optimization emerging between them. Future models could be faster, more efficient, and run locally on devices. Data centers could be paired with nuclear, geothermal, or advanced solar. No matter how we cut it, it’s clear that the AI opportunity is predicated on an energy opportunity – in combination we will massively scale compute, creating many trillions of dollars of enterprise value over the coming decades.

Which models will matter most? How will they be used? Will power continue to be the underlying constraint? Which workloads can shift in time, and which power sources will suit them best? These questions remain wide open - and we’re eager to meet founders building toward this uncertain, exciting future.

News + Insights

The AI data center gold rush: what's next beyond power?

The next frontier: GPU Utilization and Flexibility

From steel to silicon: GPUs now drive data center capex

The challenge of measuring – and then maximizing – token output

What we hope to see: GPU orchestration systems

Emerging opportunities

Keeping everything running: maintenance & servicing

The next step: Automating the physical build, repair, and upkeep of AI DCs

Uncertainty means opportunity

Stay up to date

News + Insights

The AI data center gold rush: what's next beyond power?

The next frontier: GPU Utilization and Flexibility

From steel to silicon: GPUs now drive data center capex

The challenge of measuring – and then maximizing – token output

What we hope to see: GPU orchestration systems

Emerging opportunities

Keeping everything running: maintenance & servicing

The next step: Automating the physical build, repair, and upkeep of AI DCs

Uncertainty means opportunity

Share

Stay up to date

Follow Us