Last spring, we shared our perspective on the ‘electricity gauntlet’ of AI infrastructure power demand (Part 1, Part 2). In short, today the bottleneck for AI infrastructure isn’t algorithms or capital – its physical constraints: power, land, and labor. Of these, power is by far the most challenging and critical.
This reality is now mainstream. Over the past few years, the power sector has been rapidly thrust into the spotlight in Silicon Valley and Washington. The electricity gauntlet has become one of the most talked about challenges in technology. By 2035, estimates indicate power demand from AI data centers in the US could grow more than thirtyfold, reaching 123 gigawatts, up from 4 gigawatts in 2024. Companies like GridAstra are seeing huge demand for solutions to rapidly growing interconnection costs and wait times. At GTC 2025, Jensen put it simply: “[AI] revenue is power limited.”
But in five years, the bottleneck may shift. Once electricity supply catches up with demand, the constraint won’t be how fast you can build, but whether the centers you’ve built deliver ROI. That’s when the game flips to optimization, utilization, and flexibility.
This mirrors the lifecycle of any commodity industry. At first, the only goal is to extract – like miners blasting mountains apart just to get ore into the market. Efficiency and refining comes later, once the raw rush stabilizes.
The AI DC market may follow the same pattern as fiber and telecom in the 1990s. Back then, everyone was racing to lay fiber. The only metric that mattered was miles in the ground. Operators overbuilt, fueled by cheap capital and the belief that bandwidth demand was infinite. The market corrected and consolidated, and the winners spent the next two decades squeezing efficiency from that installed base – better routing, denser multiplexing, more sophisticated pricing, and tiered services.
So where are we in this analogy for AI DC build out? We are in Phase 1.
Given the power conversation of Phase 1 is getting its deserved attention, let’s focus on what’s needed in Phase 2 and 3 of market maturity.
There are two challenges that will dominate these next two phases:
In modern AI data centers, GPUs are now the bulk of the build cost. In the 2010s, traditional DCs were built with hardware (GPUs/accelerators) as ~30–40% of the all-in build cost, and infra (land, power, cooling, backup systems) was ~60–70%. Today, modern AI clusters see hardware closer to 70%+ of total build cost, depending on design. For example, Microsoft’s 2026 data center in Wisconsin is a $4B project, with the bulk of the cost going into hundreds of thousands of B200 GPUs. This is a major inversion of economics. If GPUs dominate costs, every percentage point of idle time is tens of millions in wasted capex: a high-density rack of B200s ($4M upfront cost) sitting at 40% utilization burns through cash much faster than a marginally inefficient cooling system.
So what does that imply will drive value for DCs in Phase 2 and 3? GPU utilization, or “effective flops delivered per watt per $.” Or, even more precisely, “token output per watt per $ per second.” The utilization of GPUs is a complex problem, and one of the key optimization levers the best operators need to get right.
Before data center utilization can be optimized, data center performance needs to be accurately measured. One of the greatest uncertainties in the data center equation is how much power is actually turned into compute, or tokens. Misconceptions around “always-on” consumption are pervasive. Regulators and utilities frequently assume DCs run at ~90% load factor – ratio of (average electricity demand) to (peak electricity demand) year‑round – with statements like “operating at a consistent load factor 365 days a year” from Duke Energy.
This is a broad oversimplification of the power consumption, performance and complex load profiles of data centers, and the metrics are misleading. For example, server uptime is an inaccurate proxy for token output because uptime ensures availability, not active compute. The de facto standard metric since 2007, Power Usage Efficiency (PUE), is imprecise too – it measures energy overhead, not actual compute delivered.
Focusing on utilization exposes real inefficiencies, and significant untapped revenues in most data centers. Most colos (companies renting cages and racks) operate between 30-50% utilization and, even for the best-in-class hyperscalers, sustaining utilization rates above 60–70% is difficult. This inefficiency is often underappreciated because transparent, high-quality utilization data is scarce. A 2025 Association for Computing Machinery (ACM) study highlighted this gap: energy consumption is rarely linked to compute capacity, system configuration, or workload type in a consistent way. Few providers disclose actual utilization metrics, leaving most infrastructure planning and operations without a solid empirical foundation.
The nuances of data center’s peaky load profiles should directly impact how compute jobs are orchestrated and utilized. GPU workload orchestration (e.g,. Run:AI, Lambda, CoreWeave) aims to pack jobs more tightly so GPUs don’t sit idle. This is hard: AI power loads are energy-dense, spiky, and “bursty” — both at the job level and across the facility load profile.
The combination of insufficient performance metrics, spikey and uncertain load profiles, and a lack of “power-aware” orchestration result in the stubbornly low utilization rates we see today.
The opportunities we are focused on are in orchestration software, dynamic scheduling systems, “power-aware” workload balancing, and flexibility programs – a kind of “Kubernetes-moment” for data center infrastructure. Kubernetes took the messy plumbing of the software world and abstracted away specified configs and environments into a containerized, standardized orchestration layer. The same portability and optimization is becoming possible at the compute hardware level, particularly for less latency-sensitive workloads – nightly retraining, model fine-tuning, video indexing, video generation, search ranking, synthetic data generation, summarization – facilitating more portable, programmable and dynamically orchestrated data centers.
Today, the Siemens and Schinders of the world capture most of the value in Data Center Infrastructure Management (DCIM) tools. Given most new data centers projects are designed for huge scale (100MW and up), the approach is reasonably risk averse -- most operators are unlikely to partner with startups in running these capital-intensive AI factories.This is also true for other core infrastructure, like advanced cooling systems.
However, as the demands and precision required for optimal compute performance increase, the incumbent pole position could face competition from new players. Furthermore, with growing compute and data sovereignty laws (EU’s GDPR, China’s PIPL, U.S. state-level rules) and latency-sensitive workloads (autonomous vehicles, fintech, gaming, etc.), we are seeing a push to localized and edge compute that’s closer to the end user. These smaller systems will have different demands, creating room for innovation.
Google has been on the forefront of advanced DCs for the last decade, and continues to push the envelope. It recently announced the first-ever contracts between a hyperscaler and U.S. utility focused on AI DC flexibility. The partnership goes beyond day-to-day operations (reacting in real-time) and is now also shaping utility planning itself. Google has committed to time-based load shifting and curtailing usage during peak periods to support grid reliability – potentially reducing the need for new power generation and transmission. Most notably, the agreement includes AI workloads, not just traditional CPU-based compute.
There is significant whitespace for new tools to be developed that help other DC operators follow in Google’s footsteps:
1. AI-driven optimization
Hyperscalers already do much of this internally, but smaller DC colos and operators don’t have these tools yet. These include:
2. Power & carbon as dynamic marketplaces
Traditional DCIMs track power usage, but a new wave will actively optimize against real-time energy prices, grid constraints, and carbon intensity. Data centers are some of the largest electricity buyers in the world and will increasingly become market participants. Startups like ElectronX (trading) or WattCarbon (carbon signals) could be directly integrated into DCIM layers.
These grid-integrated operating models are moving from theory to practice: some operators now design facilities to shed load or even sell power back during peak grid demand. Emerging AI-native DCs (CoreWeave, Lambda) are exploring AI job scheduling tied to power market prices – running training workloads when electricity is cheapest.
3. Multi-DC orchestration
Today’s DCIMs are largely siloed per site but new startups could build “federated DCIMs” that let operators manage multiple colo sites or hybrid deployments from a single command center. Think spatio-temporal optimization across data centers (i.e. EmeraldAI), which is increasingly valuable as customers become open to “power-flex” SLAs (<99% to accommodate power peaks) for their compute. These SLAs typically allow customers access to either cheaper or previously unavailable GPUs.
4. Customer-facing abstraction
5. Edge or Modular DCs
Many emerging edge DCs don’t want heavyweight Schneider deployments, so there could be opportunity for lightweight, API-first DCIM built specifically for modular and edge sites (i.e. Rune Energy).
Traditional data centers already run with remote monitoring and predictive maintenance, but AI-specific centers face new challenges. GPUs fail more quickly under extreme thermal and power loads, and clusters require rapid swap-outs to avoid costly downtime. Synchronous dependencies make the problem even harder – a single GPU failure can halt an entire distributed training run, compounding the impact of outages.
1. Robotic installation & servicing
Human labor is expensive, inconsistent, and slow compared to the cost of idle GPU time. As GPU clusters become increasingly enormous, we expect to see automated systems (i.e. Mogl or Boost) that can both build new data centers, and once operational, hot-swap blades, cables, and even GPUs - all without human technicians onsite.
2. Predictive maintenance AI
Like any complex industrial facility, AI DCs need better sensors and ML models flagging failing components before they break and incur downtime.
3. Modular design
Data center hardware must evolve toward modularity, with racks and pods built for rapid swap-outs rather than slow, manual repairs.
Startups thrive in moments of flux. The AI ecosystem is still in its early buildout, with models, chips, and hardware evolving rapidly - and new layers of optimization emerging between them. Future models could be faster, more efficient, and run locally on devices. Data centers could be paired with nuclear, geothermal, or advanced solar. No matter how we cut it, it’s clear that the AI opportunity is predicated on an energy opportunity – in combination we will massively scale compute, creating many trillions of dollars of enterprise value over the coming decades.
Which models will matter most? How will they be used? Will power continue to be the underlying constraint? Which workloads can shift in time, and which power sources will suit them best? These questions remain wide open - and we’re eager to meet founders building toward this uncertain, exciting future.