How AI inference is priced, built, financed, and scaled — the economics behind every API call.
The technical pipeline follows the token from request to response. This page follows the dollar — from the cost of a single GPU-hour through pricing, business models, and capital markets.
Cost per million tokens = Total hourly GPU cost ÷ tokens served per hour. Throughput optimization is literally margin expansion.
A restaurant's cost per meal — ingredients, rent, labor, utilities. Two restaurants with identical kitchens can have wildly different costs per plate based on how many covers they turn.
| Component | Crusoe (owns infra) | CoreWeave (leases some) |
|---|---|---|
| GPU CapEx amortized (3yr) | ~$0.95/hr | ~$1.10/hr |
| Power | ~$0.03/hr | ~$0.07/hr |
| Data center (amortized) | ~$0.15/hr | ~$0.25/hr |
| Networking | ~$0.10/hr | ~$0.10/hr |
| Operations/platform | ~$0.10/hr | ~$0.12/hr |
| Total cost per H100-hr | ~$1.33/hr | ~$1.64/hr |
| Selling price | $2.20/hr | $2.20/hr |
| Gross Margin | ~40% | ~34% |
GPU purchase: H100 SXM ~$25-30K, B200 ~$30-40K. Purchased GPUs amortize over 3-5 years — cheaper per hour if utilization is high, but depreciation risk from technology obsolescence. Rented GPUs are OpEx at ~$2.00-3.00/hr mid-market (down from $7-8/hr peak).
GPU failure rate: ~2-5% annually, requiring spare buffer inventory. At 10,000 GPUs, expect 200-500 failures per year.
| GPU | On-Demand | Spot | Provider |
|---|---|---|---|
| H200 SXM | $4.29/hr | — | Crusoe |
| H100 SXM | $3.90/hr | $1.60/hr | Crusoe |
| A100 80 GB | $1.95/hr | $1.30/hr | Crusoe |
| MI300X | $3.45/hr | $0.95/hr | Crusoe |
Spot pricing fills idle capacity at 50-70% discounts. Workloads must tolerate preemption (checkpointing required), making spot ideal for batch inference and fault-tolerant training but unsuitable for latency-sensitive serving.
Power cost varies wildly: $0.03-0.05/kWh (Crusoe’s stranded energy) vs $0.08-0.12/kWh (grid in Northern Virginia). A single H100 draws ~700W under load. At $0.10/kWh = $0.07/hr per GPU. At 10,000 GPUs = $6.1M/year in electricity.
PUE (Power Usage Effectiveness): 1.1 means 10% cooling overhead; 1.4 means 40%. Liquid cooling pushes closer to 1.1, air cooling sits at 1.3-1.5. Every 0.1 improvement in PUE across a 100MW facility saves ~$600K/year.
InfiniBand for multi-GPU inference adds 15-25% to total cluster cost. For a 1,000 H100 cluster: $5-15M in NICs, switches, and cabling. NVIDIA/Mellanox near-monopoly limits price negotiation.
Storage for model weights, KV cache spill, logging: ~5-10% of compute cost.
→ How InfiniBand works in the pipeline → Training hardware costs comparisonOperational costs include SRE/DevOps staff, monitoring infrastructure, on-call rotations, and platform orchestration (Kubernetes, Slurm, auto-node-replacement).
GPU failures at 2-5% annually mean a 10,000-GPU fleet needs constant triage — detecting degraded GPUs, migrating workloads, RMA processing. This is where OpEx quietly compounds.
Important distinction: GPU TDP is just the chip. A server slot includes CPU, system memory, NICs, NVSwitches, fans, and power supply losses. Facility power adds PUE overhead. Always use facility power per GPU slot for cost calculations.
| GPU | TDP/GPU | Server Config | Server W/GPU | Facility W/GPU (PUE 1.10) |
|---|---|---|---|---|
| H100 SXM | 700W | HGX 8-GPU: ~10.2 kW | ~1,275W | ~1,400W |
| H200 SXM | 700W | HGX 8-GPU: ~10.2 kW | ~1,275W | ~1,400W |
| B200 | 1,000W | HGX 8-GPU: ~14.3 kW | ~1,790W | ~1,970W |
| GB200 (NVL72) | ~1,200W | NVL72: ~120 kW / 72 | ~1,667W | ~1,833W |
Electricity cost per GPU-hour = (Facility kW per GPU) × $/kWh
| GPU | Facility kW | At $0.10/kWh (grid) | At $0.03/kWh (Crusoe) | Delta |
|---|---|---|---|---|
| H100 | 1.40 kW | $0.140/hr | $0.042/hr | $0.098/hr |
| B200 | 1.97 kW | $0.197/hr | $0.059/hr | $0.138/hr |
| GB200 | 1.83 kW | $0.183/hr | $0.055/hr | $0.128/hr |
At 85% utilization over a year (7,446 hrs), electricity per H100: Grid = $1,042/yr. Crusoe = $313/yr. Delta = $730/yr per GPU. At 10,000 GPUs: $7.3M/year saved — just from electricity.
| Component | Grid Competitor (colo) | Crusoe (owned) | Crusoe (renting colo) |
|---|---|---|---|
| GPU amortization (3yr) | $1.07/hr | $1.07/hr | $1.07/hr |
| Electricity | $0.14/hr | $0.04/hr | $0.13/hr |
| Infrastructure | $0.35/hr | $0.15/hr | $0.41/hr |
| Networking | $0.19/hr | $0.19/hr | $0.19/hr |
| Operations | $0.10/hr | $0.10/hr | $0.10/hr |
| Total cost | $1.85/hr | $1.55/hr | $1.90/hr |
| Selling price | $2.50/hr | $2.20/hr | $2.50/hr |
| Gross Margin | 26% | 30% | 24% |
When Crusoe rents colo, both structural advantages vanish: electricity ($0.04 → $0.13) and infrastructure ($0.15 → $0.41). GPUs, networking, and operations cost the same regardless. Total delta between owned and renting: $0.35/hr per GPU. At 100 MW (~78,400 GPUs, 85% util): ~$204M/year the ownership model saves.
This is the entire argument for vertical integration in one table. Without ownership, Crusoe competes on the same cost structure as everyone else. The $0.35/hr advantage is the business model.
When renting makes sense anyway: Market validation (deploy 500 GPUs to prove demand before committing $500M+ to build), geographic reach (low-latency in Singapore/Frankfurt), and burst overflow (owned at 90%, rent the spike). The blend is what matters — own 80% base load, rent 20% flex.
Every infrastructure cost can be converted to $/GPU-hour using facility kW per GPU slot as the bridge:
Colo rate → per-GPU: At US avg $184/kW/mo, an H100 (1.40 kW): $258/mo infra = $0.35/hr. At N. Virginia $215/kW/mo: $301/mo = $0.41/hr. Infrastructure alone costs $0.35-0.41/hr before GPU amortization, networking, or electricity.
Throughput optimization IS margin expansion. If custom CUDA kernels serve 2x the tokens/sec on the same GPU, cost per token halves.
Two factories with identical machines. One runs at 30% capacity with long changeover times; the other runs at 85% with quick changeovers. Same capital cost, dramatically different unit economics.
Serving an 8B model is ~10x cheaper per token than 405B. The 405B model needs 8-16 GPUs (vs 1), performs 50x more matrix operations per token, and the larger KV cache per token reduces batch slots from 64+ to 8-16.
| Factor | 8B | 405B | Penalty |
|---|---|---|---|
| Weight data per token | ~16 GB | ~810 GB | ~50x |
| GPUs required | 1 | 8-16 | 8-16x cost |
| Communication overhead | Zero | 126 all-reduce ops | Pure penalty |
| Batch size | 64+ | 8-16 | 4-8x less throughput |
| Cost / M tokens | $0.03-0.05 | $0.50-1.00 | 10-20x |
The same model on the same GPU can have 3-5x throughput variance between naive and fully optimized serving. This includes custom attention kernels (FlashAttention, PagedAttention), quantization (FP16 → INT4), continuous batching, speculative decoding, and prefix caching.
This is why inference platform companies like Fireworks exist — their serving infrastructure (FireAttention, MemoryAlloy) extracts dramatically more tokens/sec from the same hardware.
→ Performance Optimization Stack detailsA GPU serving one request at a time wastes most of its compute capacity. Continuous batching enables 80-90% utilization by adding new requests to an in-flight batch as slots free up.
Request characteristics matter: long context uses more KV cache memory, reducing batch slots. Output tokens cost more than input tokens because decode is sequential while prefill is parallel.
→ How continuous batching worksReasoning models add internal reasoning streams on top of visible output. Effective capacity planning must split visible tokens from reasoning tokens, then cap runtime via reasoning effort or thinking budget controls.
Interleaved thinking increases scheduler contention because reasoning and tool calls share the same decode loop. Preserved thinking can reduce repeated work across turns, but raises carried-context cost if history policies are too permissive.
→ Decode behavior for reasoning modelsHardware generation creates step-function improvements. B200 with FP4 serves the same model at dramatically higher throughput than H100 at FP16. Blackwell delivers 3-5x performance per dollar vs Hopper.
This drives the technology obsolescence risk in GPU ownership — an H100 purchased today may be economically obsolete before fully depreciated.
The same model on the same GPU varies 15-17x in throughput depending on optimization depth. Each layer compounds on the previous:
| Layer | Technique | Multiplier | Cumulative tok/s |
|---|---|---|---|
| Baseline | HuggingFace defaults, FP16, static batching | 1.0x | ~300 |
| 1 | Continuous batching (dynamic slot mgmt) | ~2.5x | ~750 |
| 2 | PagedAttention (KV cache virtualization) | ~1.7x | ~1,275 |
| 3 | FlashAttention (fused ops, SRAM tiling) | ~1.3x | ~1,658 |
| 4 | Quantization (FP16 → FP8/FP4) | ~1.8x | ~2,984 |
| 5 | Custom CUDA kernels (fused, arch-specific) | ~1.3x | ~3,879 |
| 6 | Speculative decoding (draft + verify) | ~1.3x | ~5,043 |
Layers 1-3 are table stakes (vLLM, TensorRT-LLM implement them). Layers 4-6 require deep GPU engineering — custom kernels that outperform open-source take years of accumulated expertise. This is the core moat of inference platforms like Fireworks (FireAttention) and Crusoe (MemoryAlloy).
Why the gap persists: Each new model architecture (MoE, multi-modal, new attention patterns) requires new kernel development. The optimization target keeps moving.
Formula: Revenue/GPU-hr = tok/sec × 3,600 × $/token
Example: Llama 3.1 70B on optimized H100 cluster:
| Metric | Unoptimized | Fully Optimized |
|---|---|---|
| Output throughput | ~200 tok/s | ~1,200 tok/s |
| Output price ($0.90/M) | $0.65/hr | $3.89/hr |
| Input throughput | ~2,000 tok/s | ~8,000 tok/s |
| Input price ($0.30/M) | $2.16/hr | $8.64/hr |
| Blended revenue/GPU-hr | $1.25/hr | $5.80/hr |
The punchline: At Crusoe’s $1.55/hr cost, unoptimized serving barely breaks even ($1.25/hr). Fully optimized yields ~73% gross margin ($5.80/hr). A 6x throughput improvement is a 6x revenue increase at constant pricing.
Formula: GPUs/MW = 1,000,000W ÷ server W/GPU slot
| GPU | W/slot | GPUs/MW | GPU Rental (85% util) | Managed Inference (85%) |
|---|---|---|---|---|
| H100 | 1,275W | ~784 | $12.8M/MW/yr | $33.8M/MW/yr |
| B200 | 1,790W | ~559 | $9.1M/MW/yr | $24.1M/MW/yr |
| GB200 | 1,667W | ~600 | $9.8M/MW/yr | $25.9M/MW/yr |
100 MW H100 facility (85% util): GPU rental ~$1.28B/yr. Managed inference ~$3.38B/yr. Industry benchmark: AI facilities generate ~$12.50/watt/year (3x traditional DCs at $4.20/watt). Best operators already exceed $30/watt.
“Revenue per watt” is the master metric for AI data centers. It bridges the infrastructure world to the compute world in a single number.
| Business Model | Gross Margin | Why |
|---|---|---|
| Managed inference (asset-light) | 55-70% | Software optimization on popular models; don’t own infra |
| GPU rental (asset-heavy) | 25-35% | Capital-intensive; GPU amortization is largest cost |
| Crusoe managed (rental + software) | 40-55% | MemoryAlloy layer on owned infra; blended product mix |
| Crusoe energy delta | +5-6 pts | $0.03 vs $0.07-0.10/kWh flows to margin or pricing |
Every dollar shifted from “infrastructure rental revenue” to “software-delivered managed inference” potentially increases valuation multiple. Infrastructure trades at 5-8x revenue; software with recurring revenue at 15-20x.
Frontier inference cost is declining ~10x annually. Reserved contracts are economically similar to fixed-rate swaps on GPU compute prices.
Airline pricing — first class, economy, standby, and corporate contracts all sell the same seat-mile at wildly different prices based on flexibility, commitment, and timing.
| Segment | Volume | Pricing Model | Margin |
|---|---|---|---|
| Hobbyist / prototyping | <1M tok/day | Serverless per-token, free tier | Low/negative |
| Growth startup | 1-100M tok/day | Serverless → on-demand | Medium, expanding |
| Enterprise | 100M+ tok/day | Reserved capacity, custom | High |
| Batch / offline | Large, flexible | Batch pricing (50% off) | Medium (fills idle) |
GPT-3.5 equivalent pricing: $20/M tokens (late 2022) → $0.40/M tokens (2025). A 50x decline in ~2.5 years. Reserved contracts lock in today’s price — if costs keep falling, the provider profits on the spread while customers overpay for certainty.
Standard for API providers. Input/output split is critical: output tokens cost 2-4x more to serve because decode is sequential. Without the split, adverse selection occurs — output-heavy workloads subsidized by input-heavy ones.
Cached input pricing (~50% discount) incentivizes consistent prefixes, improving server-side prefix cache hit rates. This is pricing that shapes behavior to reduce costs.
| Model Size Tier | Input $/M | Output $/M |
|---|---|---|
| Under 4B | $0.10 | $0.10 |
| 4B – 16B | $0.20 | $0.20 |
| 16B – 80B | $0.90 | $0.90 |
| MoE 56B – 176B | $0.50 | $1.20 |
Fireworks serverless tiers (open-source models). Cached input at 50% discount; batch mode adds another 50% discount for non-real-time workloads. The MoE output premium reflects decode-bound compute despite efficient routing.
→ Rate limits as pricing leversFor reasoning models, many teams now separate visible output from reasoning streams (for example reasoning_content) and apply policy caps with effort, budget, and history controls.
Fine-tuned offerings add another pricing layer: base model usage + adapter premium. Fireworks-style LoRA deployment economics (including multi-adapter serving and one-click adapter deployment) allow providers to segment by workflow quality, not just raw token volume.
| Model | Input $/M | Output $/M | Ratio |
|---|---|---|---|
| DeepSeek V3 (671B MoE) | $0.56 | $1.68 | 3.0× |
| DeepSeek R1 (reasoning) | $3.00 | $8.00 | 2.7× |
| GLM-5-0520 (vision) | $1.50 | $3.00 | 2.0× |
| Kimi K2.5 (MoE) | $0.48 | $1.44 | 3.0× |
Output tokens cost 2–5× input across all models — the ratio reflects decode’s sequential nature. Reasoning models carry the highest absolute rates because extended chain-of-thought generates 5–20× more output tokens per query.
→ Fine-tuning methods and deployment implicationsRevenue predictability for both sides. Reserved contracts are economically similar to fixed-rate swaps: provider receives fixed payments, effectively pays floating (market cost of delivering compute). In a deflationary environment, existing contracts become more valuable.
The real risk isn’t the contract price — it’s what the contract signals about capacity planning. Contracted capacity is hedged; uncontracted capacity is an outright bet on future pricing.
Like standby airline tickets — fills idle capacity. Customer gives up latency guarantees in exchange for significant discount. Provider gains utilization in off-peak periods.
Batch workloads smooth demand curves: if real-time peaks at 2pm, batch jobs absorb 2am-6am idle capacity. This improves overall fleet utilization from 40-50% to 80-90%.
Land and expand: Price serverless tier aggressively (even below cost) to acquire developers, monetize at scale when usage grows.
Value-based: Voice agent inference priced per-minute, not per-token. Aligns with customer’s value perception — they think in “minutes of agent time,” not tokens.
Competitive moat: If you have 2x throughput advantage (FireAttention), price 30% below competitors while maintaining better margins. Your optimization is their impossibility.
Managed inference sells outcomes at 55-70% gross margin via statistical multiplexing. GPU rental sells infrastructure at 34-40% margin with simpler operations but massive CapEx.
Uber vs Hertz. Uber sells rides (outcomes) and pools cars across thousands of passengers. Hertz rents you the car — you worry about driving. Both make money on vehicles, but the economics are completely different.
| Dimension | Managed Inference | GPU Rental |
|---|---|---|
| What you sell | Tokens, completions, minutes | Raw GPU-hours |
| Customer thinks about | Outcomes | Infrastructure |
| Gross margin | 55-70% | 34-40% |
| Key advantage | Statistical multiplexing | Simpler operations |
| CapEx intensity | Lower (can rent GPUs) | Very high |
| R&D cost | High (serving stack) | Moderate (platform) |
| Risk | Correlated demand spikes | Utilization & pricing deflation |
Statistical multiplexing — Customer A peaks at 2pm, Customer B at 6pm, Customer C runs batch overnight. Pooling across thousands of customers enables 80-90% GPU utilization, far higher than any single customer achieves.
Like insurance pooling — the law of large numbers applies to token traffic. The risk: correlated demand spikes. When a new model drops or a viral AI demo happens, everyone hits the API simultaneously. This is the “catastrophic event” requiring burst capacity or graceful degradation.
→ How continuous batching enables thisInput vs output split: Output tokens cost 2-4x more to serve. Without the split, adverse selection occurs — output-heavy workloads are subsidized.
Cached input pricing: ~50% discount incentivizes consistent prefixes, improving prefix cache hit rates. Pricing that shapes behavior.
Model-specific pricing: Based on cost to serve, competitive pricing, demand elasticity, and strategic value. Popular models subsidize long-tail catalog with sparse traffic.
| GPU | Fireworks On-Demand | Crusoe On-Demand | Crusoe Spot |
|---|---|---|---|
| H100 SXM | $5.49/hr | $3.90/hr | $1.60/hr |
| A100 80 GB | $3.19/hr | $1.95/hr | $1.30/hr |
The managed premium: Fireworks charges ~40% more than Crusoe on-demand for the same GPU because the rate bundles inference-optimized software (FireAttention, LoRA multiplexing, autoscaling). Crusoe spot at $1.60/hr represents raw infrastructure. The spread ($1.60 → $5.49) is where the entire inference software stack gets monetized.
1. Model catalog: Which models to add and when — each has hosting cost whether used or not. Warm GPUs for a model nobody calls is pure waste.
2. Rate limits & SLA tiers: Maps to burst capacity reserved per customer. Higher tier = more dedicated headroom = higher price.
3. Model deprecation: Old models eat GPU memory, but enterprise customers depend on them. Migration timelines are diplomatic minefields.
4. Optimization pass-through: When your team ships a 2x throughput improvement, do you pass savings to customers (growth) or keep as margin (profitability)?
When Crusoe launches Managed Inference alongside GPU rental, they create internal tension. If MemoryAlloy delivers 9.9x better TTFT and 81% cost reduction, why would any inference customer rent raw GPUs?
How MemoryAlloy works: a cluster-wide KV cache that uses RDMA to pool GPU memory across nodes. Instead of each GPU holding its own cache, MemoryAlloy treats the entire cluster’s HBM as a shared resource. This enables instant context reuse across requests — the 9.9× TTFT improvement comes from eliminating redundant prefill when cached KV states exist elsewhere in the cluster. Supported models include Llama 3.1/3.3, DeepSeek R1/V3, and Qwen 2.5.
The resolution: GPU rental increasingly serves training customers and custom workloads. Managed inference captures inference demand at higher margin. Total revenue per GPU potentially goes up — but the PM must model the revenue migration carefully.
GPU cloud providers differentiate on platform features beyond raw compute. Crusoe’s stack illustrates what enterprise buyers expect:
Networking: InfiniBand for GPU-to-GPU, NVLink within nodes, RDMA for cluster-wide memory access. These determine whether multi-node inference and distributed training are viable.
Orchestration: Managed Kubernetes (K8s) and Slurm for job scheduling. AutoClusters provision multi-node GPU environments in minutes rather than days of manual configuration.
Isolation: VPC (Virtual Private Cloud) for network isolation, per-minute billing granularity, and enterprise security boundaries.
SLAs: 99.98% uptime guarantee, <6 minute support response time. These commitments unlock enterprise procurement cycles that require contractual guarantees.
Hardware roadmap: GB200, B200, and MI355X availability signals to customers that the provider won’t strand them on last-gen hardware — critical for multi-year capacity planning.
| Model | GPU Rental Revenue/hr | Managed Inference/hr (optimized) | Multiplier |
|---|---|---|---|
| Llama 3.1 8B | $2.20 (full H100, overkill) | $8-15/hr (high throughput, massive volume) | 4-7x |
| Llama 3.1 70B | $2.20 × 2 GPUs = $4.40 | $5-10/hr (moderate throughput) | 1-2x |
| Llama 3.1 405B | $2.20 × 8+ GPUs = $17.60+ | $8-15/hr (low throughput, premium price) | 0.5-1x |
Managed inference generates more revenue per GPU-hour for popular small-to-mid models (where batching and optimization shine). For very large models (405B+), GPU rental can yield comparable revenue because batch sizes are limited and multi-GPU overhead is high.
CapEx-heavy companies trade at lower revenue multiples. Pure SaaS growing 50% trades at 15-20x revenue. Infrastructure company growing 50% with 60% CapEx intensity trades at 5-8x.
But recurring infrastructure revenue has its own premium. Long-term contracted data center revenue is valued almost like a bond. Digital Realty and Equinix trade at 20-30x AFFO because revenue is contracted and recurring.
Crusoe’s IPO narrative: If positioned as “we spend billions on data centers” → infrastructure multiples. If positioned as “we have $1.5B in contracted cloud ARR growing 5x with managed inference software on top” → blended multiples. Product mix decisions directly affect how the market values the entire company.
Buying a GPU = being long a depreciating asset with uncertain future value. Renting = buying a monthly call option on compute capacity. The optimal strategy is a barbell: own base-load, rent flexibility.
Electric utilities: own baseload power plants (nuclear, hydro — predictable, cheap per kWh) and buy peaking capacity on the spot market (gas turbines — flexible, expensive per kWh).
| Scenario | H100 Residual (3yr) | Value | Implication |
|---|---|---|---|
| Bull case | 40% | ~$11K | Ownership strongly favored |
| Base case | 20% | ~$5.5K | Ownership favored at high utilization |
| Bear case | 5% | ~$1.4K | Next-gen makes it essentially worthless |
Buy side TCO per GPU-hour: Purchase price amortized over useful life + cost of capital + power/cooling + maintenance (2-5% annual failure rate) − residual/salvage value.
Depreciation tension: H100 straight-line over 3 years = ~$0.95/hr. Over 5 years = ~$0.57/hr. But GPU tech cycles are accelerating: H100 (2022) → H200 (2024) → B200 (2025). If Blackwell delivers 3-5x performance per dollar, H100 depreciating over 5 years is economically obsolete before fully depreciated.
Rent side: ~$2.00-3.00/hr for H100. No upfront CapEx. Flexibility to scale. Obsolescence risk sits with the lessor — but you’re paying their margin.
If blended WACC for GPU purchases is 12%, the $28K H100 has economic cost of $3,360/year in capital charges alone — adding ~$0.38/hr to ownership cost, pushing fully loaded from $1.10 to ~$1.48/hr.
Breakeven utilization moves from 44% to ~59%. If funded with pure equity at 25%, breakeven jumps to ~76%. Interest rate environment matters: 300-400 bps difference on $1B GPU purchase = $30-40M/year in additional interest expense.
Renting = buying a monthly call option on compute. You pay a premium but maintain optionality to switch hardware, scale down, or pivot. Option value increases when volatility is high (AI hardware changing every 12-18 months), time horizon is uncertain, and interest rates are high.
Buying is attractive when: Demand is highly predictable (Crusoe’s 15-year Abilene lease), structural cost advantage exists (cheap power extends economic life), and the asset can be redeployed across use cases.
→ GPU Memory Hierarchy detailsThe optimal strategy: heavy ownership of base-load capacity funded by low-cost infrastructure debt secured against long-term contracts, combined with rental/spot capacity for flexibility.
Like utilities: own baseload plants, buy peaking capacity on the spot market. The owned portion provides cost advantage; the rented portion provides optionality. The ratio depends on demand predictability and cost of capital.
Crusoe's ~$50/kW/month energy advantage at a 100MW facility = $60M/year in structural cost savings that flow directly to margin or competitive pricing.
Real estate for aluminum smelters — you don’t price by square footage because the smelter’s value is entirely determined by access to cheap electricity. Same with AI data centers.
| Facility Type | $/kW/month |
|---|---|
| Wholesale colocation (legacy) | $80-120 |
| Retail colocation | $120-180 |
| AI-optimized facility | $150-250+ |
| Hyperscaler self-build | $50-80 effective |
Traditional operator at $0.10/kWh: 1 kW continuous for a month = $72 just in electricity. Out of $150/kW/month price, nearly half is power. Crusoe at $0.03/kWh: same 1 kW = $22/month.
| Location | Capacity | Energy Source |
|---|---|---|
| Abilene, TX | 1.2 GW | Grid + renewables |
| Iceland | Expanding | Geothermal (near-zero carbon) |
| Wyoming | Operating | Stranded natural gas |
| Norway | Planned | Hydroelectric |
| Argentina | Planned | Stranded gas (Vaca Muerta) |
Crusoe’s Abilene campus alone at 1.2 GW would rank among the largest single-site data center deployments globally. The geographic diversification across clean/stranded energy sources hedges against regional power regulation and positions the company for enterprise customers with carbon-reduction mandates. A nuclear energy partnership with Blue Owl Capital adds baseload diversification beyond renewables.
Legacy server rack: 2-4 kW. AI rack with 8 H100s: 40-70 kW. GB200 NVL72 rack: 120-140 kW. That’s 30-70x more power but roughly the same physical footprint. Pricing by square footage breaks completely.
Power delivery infrastructure alone costs millions: transformers ($1-5M each), switchgear, UPS, backup generators, distribution. Redundancy requirements (N+1 or 2N) mean 100 kW provisioned → 200 kW built.
Every watt consumed becomes a watt of heat to remove. At 140 kW/rack, air cooling physically cannot keep up — liquid cooling is required. Cost scales linearly with kW.
PUE captures this overhead. PUE of 1.1 (liquid) means 10% cooling overhead. PUE of 1.4 (air) means 40%. The difference at 100MW: an extra 30MW consumed just for cooling, or ~$15M/year at grid rates.
The $/kW/month price bundles: power delivery infrastructure (transformers, switchgear, UPS, generators, distribution), cooling capacity (linearly proportional to power), redundancy (N+1 or 2N), and physical space (essentially free at this point — tiny fraction of total cost).
This is why the metric is $/kW/month: it captures the actual scarce resource (power capacity) rather than the abundant one (floor space).
| Component | Industry $/MW | Crusoe $/MW | Advantage? |
|---|---|---|---|
| Shell & core construction | $9-15M | $8-12M | Modest (cheap land, modular build) |
| AI fit-out (GPUs, networking) | $20-25M | $20-25M | None — GPU cost is GPU cost |
| Total all-in per MW | $35-45M | $28-37M | — |
Shell & core components: Land ($100-500/kW, 1-3% of total), power infrastructure ($2-4.5K/kW, 20-30% — Crusoe saves on grid interconnection via behind-the-meter), cooling ($1.5-3K/kW, 15-20% — cold-climate sites save on heat rejection), building ($600-1.2K/kW, 5-10%), networking ($350-900/kW, 3-7%).
GPU depreciation mismatch is the biggest financial risk. Accounting: 3-5 years. Economic life: 2-3 years. Crusoe’s cheap power extends viability of older GPUs for batch workloads — converting a liability (aging GPUs) into an asset (cheap compute).
| Component | Industry $/MW/yr | Crusoe $/MW/yr | Advantage? |
|---|---|---|---|
| Electricity | $675K-1,050K | $190-440K | Yes — structural |
| Cooling operations | $150-400K | $50-150K | Yes — partial |
| Personnel | $200-400K | $200-450K | No — possibly higher (rural premiums) |
| Hardware maintenance | $150-350K | $150-350K | No — GPU failure rates are physics |
| Networking & connectivity | $50-120K | $60-150K | No — dark fiber to rural costs more |
| Other (insurance, software, etc.) | $80-280K | $80-280K | No |
| Total OpEx | $1.3-2.7M | $810K-1.9M | — |
Critical honesty: The $50-80M/year advantage at 100 MW is almost entirely electricity and cooling (~$48-75M of total). Personnel, maintenance, networking, and compliance run at industry rates or slightly higher due to remote locations. The energy advantage is large enough to dominate total OpEx regardless.
The math that defines the moat: 1 kW × 730 hrs/mo at $0.10/kWh = $73/kW/mo. At $0.03/kWh = $22/kW/mo. Delta: $51/kW/mo = $612/kW/yr. At 1.2 GW (Abilene full scale): $735M/year structural advantage.
| Market | $/kW/month (2025) | Notes |
|---|---|---|
| US wholesale average | $163-184 | CBRE H1 2025: $184/kW/mo |
| Northern Virginia (Ashburn) | $200-215+ | Record highs, <2% vacancy |
| Silicon Valley | $200-250+ | Highest US cost market |
| Phoenix | ~$190 | Rapid growth, 1.7% vacancy |
| Dallas/Texas | $140-170 | Business-friendly regulatory |
| Atlanta | $120-150 | Cheapest major US market |
| AI-optimized high-density | $180-300+ | Premium for liquid cooling, 100+ kW/rack |
| Iceland/Nordics | $80-130 | Cheap hydro/geothermal; higher connectivity |
| Singapore | $350-450+ | Most expensive globally |
Key trend: Nearly 75% of the 5,242 MW under construction in North America is pre-leased. Some commitments extend to capacity not delivering until 2027+. This is a seller’s market — every MW of operational capacity commands premium pricing.
Crusoe as competitor: Energy advantage means Crusoe can price at $100-140/kW/mo and still achieve superior margins vs competitors at $160-200/kW/mo.
| Category | Industry All-In | Crusoe Estimated |
|---|---|---|
| Total CapEx | $3.0-4.0B | $2.8-3.7B |
| Shell & core | $900M-1.5B | $800M-1.2B |
| AI fit-out (GPUs, networking) | $2.0-2.5B | $2.0-2.5B |
| Annual OpEx | $130-250M | $75-175M |
| Electricity | $67-105M | $19-44M |
| Cooling | $15-40M | $5-15M |
| Personnel | $20-40M | $20-45M |
| Hardware maintenance | $15-35M | $15-35M |
| Revenue (85% util) | $200-350M | $200-350M |
| Target Gross Margin | 35-50% | 40-55% |
The CapEx savings ($100-300M) are real but limited to shell & core — NVIDIA prices GPUs the same for everyone. The margin improvement (5-10 points) comes from OpEx: $50-80M/year in electricity and cooling savings on $200-350M revenue. Over a 15-year facility life, ~$65M/year average advantage compounds to nearly $1B in cumulative savings.
Equity costs 3-4x more than debt because equity holders bear residual risk with no contractual protection. The tax shield on debt interest makes the gap even wider.
A building with floors. Debt holders live on the ground floor — if the building sinks, they’re last to get wet. Equity holders live in the penthouse — great view, but first to feel any earthquake. Higher floor = higher risk = higher rent.
| Dimension | Debt | Equity |
|---|---|---|
| Priority in liquidation | First (senior) | Last (residual) |
| Cash flows | Contractual (fixed interest) | Residual (dividends optional) |
| Upside | Capped at interest rate | Uncapped |
| Downside | Protected by covenants | Can lose everything |
| Tax treatment | Interest is deductible | Returns are not |
| Typical cost | 7-10% | 20-30%+ |
Interest payments are tax-deductible. At 21% corporate tax rate, 8% interest → ~6.3% after-tax cost. Equity returns have no such benefit.
On $5B of debt at 8%, the tax shield is worth $84M/year — pure value creation from choosing debt over equity for the same investment.
Debt holder’s max upside is the interest rate (8%). They price for downside protection via covenants, collateral, and priority. Equity investors face uncapped upside and uncapped downside — they need much higher expected returns to justify the risk.
Information asymmetry compounds this: debt holders protect themselves with covenants, while equity holders have board seats but can’t contractually force profitability. More trust required = riskier = more expensive.
If debt is cheaper, why not fund everything with debt? Three limits:
1. Financial distress risk: Too much leverage makes a temporary downturn existential. Missing one interest payment can trigger default cascades.
2. Debt capacity limits: Lenders won’t fund beyond cash flow coverage. The DSCR sets a hard ceiling.
3. Rising marginal cost: First billion at 8%, fifth billion at 12%. At some point, additional debt costs more than equity.
A 40% commitment discount looks expensive in isolation, but the contracted revenue it creates can unlock debt capacity whose NPV far exceeds the discount given. Pricing decisions directly affect capital structure.
Getting a mortgage. The bank doesn’t just look at the house (collateral) — they look at your salary and employment contract (contracted revenue). A tenured professor with $100K salary gets a bigger mortgage at a lower rate than a freelancer earning $150K with no contract.
Lenders size loans using DSCR — want cash flow at 1.3-2.0x annual debt service.
Without contracted revenue: Lender underwrites to conservative $300M → $150M FCF → supports ~$100M/year debt service → ~$1-1.5B total debt capacity.
With 15-year Oracle contract at $600M/year: Lender underwrites to $600M → $400M FCF → supports ~$267M/year debt service → ~$3-4B total debt capacity. Same facility, dramatically different borrowing.
Contracted revenue with investment-grade counterparty compresses spread: SOFR + 400bps → SOFR + 250bps. On $5B = $75M/year in interest savings.
Looser covenants: fewer restrictions on additional borrowing, less stringent maintenance ratios, more operational flexibility. Longer tenor: 15-year contracted revenue supports 10-12 year debt vs 3-5 years for uncontracted.
Lenders structure assignment of contracts as security interest. If Crusoe defaults, JPMorgan could step into Crusoe’s position and receive Oracle’s lease payments directly.
Like mortgage-backed securities — the payment streams themselves are the security. Not the physical assets, but the right to receive contracted cash flows.
15-year contracted revenue supports 10-12 year debt. Uncontracted supports only 3-5 years. The math on identical $1B borrowed:
| Metric | 5-year at 8% | 12-year at 7% | Difference |
|---|---|---|---|
| Annual principal | $200M | ~$83M | — |
| Year 1 interest | ~$80M | ~$70M | — |
| Total Year 1 debt service | ~$280M | ~$153M | $127M freed |
| DSCR on $400M FCF | 1.43x (tight) | 2.61x (comfortable) | — |
| Revenue decline before breach | 9% | 45% | — |
$127M/year less in debt service to reinvest, build capacity, or weather downturns. A competitor with shorter-tenor debt has higher fixed obligations and less room to cut prices during a price war. Debt structure becomes a competitive weapon.
The Abilene example: $9.6B JPMorgan facility secured against 15-year Oracle/Stargate lease. On 5-year debt, annual principal alone = ~$1.92B vs ~$800M on 12-year. The difference ($1.1B/year) would make the project borderline unfinanceable.
Every 1-year or 3-year GPU reservation contract signed makes the debt package more attractive to lenders → cheaper debt → lower WACC → more competitive pricing → more customers → more contracts → even more debt capacity.
This is why commitment discounts aren’t just about revenue: they’re a capital structure optimization. The PM must model the full-cycle NPV, not just the direct pricing impact.
Reserved pricing on owned infrastructure: At Crusoe’s $1.55/hr cost, a 40% reserved discount yields $1.32/hr — a -15% GPU-hour margin. But the contracted revenue enables debt at SOFR + 250bps instead of + 400bps and extends tenor from 5yr to 12yr. The capital structure benefit (lower WACC, $127M/yr less debt service per $1B) often exceeds the margin given away. The correct analysis is NPV of the entire capital structure impact, not P&L on the individual GPU-hour.
Debt requires predictability. Equity tolerates uncertainty. Crusoe runs two capital structures in one company: project-finance-funded infrastructure + venture-funded software.
A person’s financial life. College student = 100% “equity” (parental support, scholarships). First job = some debt (car loan). Established career = mortgage, credit lines. Retired = optimized portfolio. More predictable income unlocks more leverage.
Series A: No revenue, no assets, no track record. Might not exist in 18 months. Cost of equity: 50-100%+ implied (VCs need 100x potential). Debt is unavailable. Mix: 95-100% equity. Exception: venture debt (20-30% of last equity round) with warrants.
Series B: $5-20M ARR, real product, paying customers. Cost of equity: 30-50%. Cost of debt: 12-15% (venture debt with warrants). Mix: 80-90% equity. Equipment financing becomes possible (80% LTV on GPU purchases).
Series C/D: $50-200M+ ARR, proven unit economics, enterprise contracts. Cost of equity: 15-25%. Cost of debt: 8-12% (term loans, asset-backed). Mix: 60-80% equity. CoreWeave pioneered billions in GPU-backed debt at this stage.
Pre-IPO: $200M-1B+ revenue, clear profitability path. Cost of equity: 12-20%. Cost of debt: 6-9%. Mix: 40-60% equity. Convertible notes popular: lower interest (2-5%) with embedded call option. Crusoe is here now (Series E, $10B+).
Cost of equity drops dramatically: 10-15% (liquidity, transparency, diversifiability). Cost of debt: 4-7% (investment-grade bonds, commercial paper). Gap narrows but never closes — debt holders are always paid first.
Active capital management: share buybacks, debt-funded buybacks, dividend policy, credit rating management. The CFO becomes a portfolio manager optimizing the capital structure continuously.
Crusoe runs two capital structures simultaneously:
Project-finance infrastructure: Abilene’s $9.6B debt + $5B equity. Low cost of capital, asset-heavy, contracted cash flows. Like a utility or pipeline company.
Venture-funded software: Cloud platform, managed inference. High cost of capital, asset-light, uncertain returns. Like a typical tech startup.
The PM must understand which investments belong to which bucket. Managed inference feature ($20M) → equity-funded, needs venture-scale returns. Data center with signed contract → leverage cheap debt, lower hurdle rate.
The default mental model is wrong right now. Normally: spend $100 on an asset, it depreciates, worth less than $100 immediately. AI data center infrastructure breaks this because of extreme supply-demand imbalance.
Supply side — a cascade of sequential bottlenecks:
1. Power interconnection (3-5 year queue) — can’t buy your way to the front. 2. Permitting (12-18 months) — single legal challenge adds a year. 3. Skilled labor (fully booked) — 140 kW/rack facilities need specialized contractors. 4. Equipment (12-24 months) — switchgear up 50%, generators up 45% since 2021. 5. GPUs (allocation-constrained).
Total: 2-4 years from “decide to build” to “serving customers.” A completed facility represents the crystallized result of navigating every bottleneck. The real estate parallel: a house costs $400K to build but sells for $1.2M — the delta is the embedded scarcity of entitled, permitted, connected land.
When this reverses: Supply catches up (hyperscaler buildouts complete), demand plateaus (AI scaling hits diminishing returns), or technology shifts (inference efficiency, on-device inference). Crusoe’s energy advantage persists even if scarcity fades — cheap power remains valuable regardless.
Core principle: Commit heavily where certainty is high. Maintain cheap optionality where it isn’t. Avoid the expensive middle of moderate commitment with moderate flexibility.
| Domain | Owned Extreme (~80%) | Flex Extreme (~20%) | Middle to Avoid |
|---|---|---|---|
| GPU Fleet | Owned GPUs for contracted customers; equipment-backed debt | Rented spot/on-demand for spikes | Buying on speculation without revenue certainty |
| Data Centers | Owned facilities with 15yr leases (Abilene) | Leased colo in new markets to validate demand | Building 500 MW with only 100 MW contracted |
| Power | Stranded gas/hydro/geothermal; long-term PPAs | Spot grid for peaks | Building 200 MW turbine for uncertain demand |
| Model Catalog | Deep optimization on top 10-15 models (80%+ revenue) | vLLM defaults on 80+ long-tail models | Moderate optimization across 50 models |
| Customers | Enterprise ($500K-5M+ contracts, high touch) | Self-serve developers (near-zero CAC) | Mid-market ($50-200K, significant sales effort, poor LTV/CAC) |
| Pricing | Aggressive serverless (acquisition, below cost) | Premium enterprise (40-60% margin) | Moderate pricing that attracts neither segment |
Why barbell works in AI: Extreme demand uncertainty (single model release shifts demand by orders of magnitude), rapid technology obsolescence (GPU generations every 18-24 months), and power law economics (small number of models/customers drive vast majority of revenue). Invest heavily in winners, maintain cheap options on everything else.