Unit Economics
Cost Stack Throughput Pricing
Business Models
Managed vs Rental Buy vs Rent Data Centers
Capital Structure
Equity vs Debt Contracted Revenue Stage by Stage
The Business of AI Compute

From Token to Balance Sheet

How AI inference is priced, built, financed, and scaled — the economics behind every API call.

Scroll to explore
D
Unit Economics
The micro view — per-token cost & price
D1 · The Cost Stack D2 · Throughput as Margin D3 · Pricing Structures
E
Business Models
Company decisions — build, rent, or serve
E1 · Managed Inference vs GPU Rental E2 · Buy vs Rent GPUs E3 · Data Center Economics
F
Capital Structure
Macro view — how it all gets financed
F1 · Why Equity Costs More Than Debt F2 · Contracted Revenue & Lenders F3 · Debt vs Equity Across Stages
Economics of AI Inference

Follow the Dollar

The technical pipeline follows the token from request to response. This page follows the dollar — from the cost of a single GPU-hour through pricing, business models, and capital markets.

D
Unit Economics
What it costs to serve a token & what you charge for it
D1 The Cost Stack Economics +
Bottom-up cost per GPU-hour: hardware amortization, power, networking, data center, and operations. CapEx vs OpEx drives everything.
Key Takeaway

Cost per million tokens = Total hourly GPU cost ÷ tokens served per hour. Throughput optimization is literally margin expansion.

Think of it like...

A restaurant's cost per meal — ingredients, rent, labor, utilities. Two restaurants with identical kitchens can have wildly different costs per plate based on how many covers they turn.

Cost Comparison
ComponentCrusoe (owns infra)CoreWeave (leases some)
GPU CapEx amortized (3yr)~$0.95/hr~$1.10/hr
Power~$0.03/hr~$0.07/hr
Data center (amortized)~$0.15/hr~$0.25/hr
Networking~$0.10/hr~$0.10/hr
Operations/platform~$0.10/hr~$0.12/hr
Total cost per H100-hr~$1.33/hr~$1.64/hr
Selling price$2.20/hr$2.20/hr
Gross Margin~40%~34%
Interactive: Cost Waterfall
Hardware / Compute +
GPU purchase & amortization — the largest line item

GPU purchase: H100 SXM ~$25-30K, B200 ~$30-40K. Purchased GPUs amortize over 3-5 years — cheaper per hour if utilization is high, but depreciation risk from technology obsolescence. Rented GPUs are OpEx at ~$2.00-3.00/hr mid-market (down from $7-8/hr peak).

GPU failure rate: ~2-5% annually, requiring spare buffer inventory. At 10,000 GPUs, expect 200-500 failures per year.

GPUOn-DemandSpotProvider
H200 SXM$4.29/hrCrusoe
H100 SXM$3.90/hr$1.60/hrCrusoe
A100 80 GB$1.95/hr$1.30/hrCrusoe
MI300X$3.45/hr$0.95/hrCrusoe

Spot pricing fills idle capacity at 50-70% discounts. Workloads must tolerate preemption (checkpointing required), making spot ideal for batch inference and fault-tolerant training but unsuitable for latency-sensitive serving.

Power & Cooling +
PUE, electricity rates, and Crusoe’s structural advantage

Power cost varies wildly: $0.03-0.05/kWh (Crusoe’s stranded energy) vs $0.08-0.12/kWh (grid in Northern Virginia). A single H100 draws ~700W under load. At $0.10/kWh = $0.07/hr per GPU. At 10,000 GPUs = $6.1M/year in electricity.

PUE (Power Usage Effectiveness): 1.1 means 10% cooling overhead; 1.4 means 40%. Liquid cooling pushes closer to 1.1, air cooling sits at 1.3-1.5. Every 0.1 improvement in PUE across a 100MW facility saves ~$600K/year.

Networking & Storage +
InfiniBand adds 15-25% to total cluster cost

InfiniBand for multi-GPU inference adds 15-25% to total cluster cost. For a 1,000 H100 cluster: $5-15M in NICs, switches, and cabling. NVIDIA/Mellanox near-monopoly limits price negotiation.

Storage for model weights, KV cache spill, logging: ~5-10% of compute cost.

→ How InfiniBand works in the pipeline → Training hardware costs comparison
Operations +
SRE/DevOps staff, monitoring, on-call

Operational costs include SRE/DevOps staff, monitoring infrastructure, on-call rotations, and platform orchestration (Kubernetes, Slurm, auto-node-replacement).

GPU failures at 2-5% annually mean a 10,000-GPU fleet needs constant triage — detecting degraded GPUs, migrating workloads, RMA processing. This is where OpEx quietly compounds.

GPU Power Draw & Facility Cost +
The single number that bridges infrastructure to GPU economics

Important distinction: GPU TDP is just the chip. A server slot includes CPU, system memory, NICs, NVSwitches, fans, and power supply losses. Facility power adds PUE overhead. Always use facility power per GPU slot for cost calculations.

GPUTDP/GPUServer ConfigServer W/GPUFacility W/GPU (PUE 1.10)
H100 SXM700WHGX 8-GPU: ~10.2 kW~1,275W~1,400W
H200 SXM700WHGX 8-GPU: ~10.2 kW~1,275W~1,400W
B2001,000WHGX 8-GPU: ~14.3 kW~1,790W~1,970W
GB200 (NVL72)~1,200WNVL72: ~120 kW / 72~1,667W~1,833W

Electricity cost per GPU-hour = (Facility kW per GPU) × $/kWh

GPUFacility kWAt $0.10/kWh (grid)At $0.03/kWh (Crusoe)Delta
H1001.40 kW$0.140/hr$0.042/hr$0.098/hr
B2001.97 kW$0.197/hr$0.059/hr$0.138/hr
GB2001.83 kW$0.183/hr$0.055/hr$0.128/hr

At 85% utilization over a year (7,446 hrs), electricity per H100: Grid = $1,042/yr. Crusoe = $313/yr. Delta = $730/yr per GPU. At 10,000 GPUs: $7.3M/year saved — just from electricity.

Three Ownership Scenarios +
Grid competitor vs Crusoe-owned vs Crusoe-renting
ComponentGrid Competitor (colo)Crusoe (owned)Crusoe (renting colo)
GPU amortization (3yr)$1.07/hr$1.07/hr$1.07/hr
Electricity$0.14/hr$0.04/hr$0.13/hr
Infrastructure$0.35/hr$0.15/hr$0.41/hr
Networking$0.19/hr$0.19/hr$0.19/hr
Operations$0.10/hr$0.10/hr$0.10/hr
Total cost$1.85/hr$1.55/hr$1.90/hr
Selling price$2.50/hr$2.20/hr$2.50/hr
Gross Margin26%30%24%

When Crusoe rents colo, both structural advantages vanish: electricity ($0.04 → $0.13) and infrastructure ($0.15 → $0.41). GPUs, networking, and operations cost the same regardless. Total delta between owned and renting: $0.35/hr per GPU. At 100 MW (~78,400 GPUs, 85% util): ~$204M/year the ownership model saves.

This is the entire argument for vertical integration in one table. Without ownership, Crusoe competes on the same cost structure as everyone else. The $0.35/hr advantage is the business model.

When renting makes sense anyway: Market validation (deploy 500 GPUs to prove demand before committing $500M+ to build), geographic reach (low-latency in Singapore/Frankfurt), and burst overflow (owned at 90%, rent the spike). The blend is what matters — own 80% base load, rent 20% flex.

The Complete Cost Bridge +
Converting between $/kWh, $/kW/month, and $/GPU-hour

Every infrastructure cost can be converted to $/GPU-hour using facility kW per GPU slot as the bridge:

Infrastructure Conversion GPU metric ──────────────── ────────── ────────── $/kWh (electricity) × kW/GPU × PUE → $/GPU-hr (power) $/kW/month (colo) × kW/GPU ÷ 730 hrs → $/GPU-hr (infra) $/GPU (purchase) ÷ useful life hrs → $/GPU-hr (amort) ────────────────────────────────────────────────────────── Sum of above = Total $/GPU-hr

Colo rate → per-GPU: At US avg $184/kW/mo, an H100 (1.40 kW): $258/mo infra = $0.35/hr. At N. Virginia $215/kW/mo: $301/mo = $0.41/hr. Infrastructure alone costs $0.35-0.41/hr before GPU amortization, networking, or electricity.

D2 Throughput as Margin Economics +
Same GPU, same model — wildly different costs. Optimization level creates 3-5x throughput variance, which is why prices vary 10x+ across providers.
Key Takeaway

Throughput optimization IS margin expansion. If custom CUDA kernels serve 2x the tokens/sec on the same GPU, cost per token halves.

Think of it like...

Two factories with identical machines. One runs at 30% capacity with long changeover times; the other runs at 85% with quick changeovers. Same capital cost, dramatically different unit economics.

The Formula
Cost per million tokens = (Total hourly GPU cost) ÷ (tokens served per hour) Numerator is largely fixed (hardware + power + overhead) Denominator varies 3-5x based on optimization level → Same hardware, 3-5x cost variance
What Drives the Denominator
3-5x
Throughput Variance
10x+
Price Variance
80-90%
Target Utilization
Model Size +
8B is ~10x cheaper per token than 405B

Serving an 8B model is ~10x cheaper per token than 405B. The 405B model needs 8-16 GPUs (vs 1), performs 50x more matrix operations per token, and the larger KV cache per token reduces batch slots from 64+ to 8-16.

Factor8B405BPenalty
Weight data per token~16 GB~810 GB~50x
GPUs required18-168-16x cost
Communication overheadZero126 all-reduce opsPure penalty
Batch size64+8-164-8x less throughput
Cost / M tokens$0.03-0.05$0.50-1.0010-20x
Optimization Level +
Naive vs fully optimized — same GPU, 3-5x throughput difference

The same model on the same GPU can have 3-5x throughput variance between naive and fully optimized serving. This includes custom attention kernels (FlashAttention, PagedAttention), quantization (FP16 → INT4), continuous batching, speculative decoding, and prefix caching.

This is why inference platform companies like Fireworks exist — their serving infrastructure (FireAttention, MemoryAlloy) extracts dramatically more tokens/sec from the same hardware.

→ Performance Optimization Stack details
Batching & Utilization +
GPU serving one request at a time wastes most capacity

A GPU serving one request at a time wastes most of its compute capacity. Continuous batching enables 80-90% utilization by adding new requests to an in-flight batch as slots free up.

Request characteristics matter: long context uses more KV cache memory, reducing batch slots. Output tokens cost more than input tokens because decode is sequential while prefill is parallel.

→ How continuous batching works
Reasoning-Mode Throughput +
Interleaved and preserved thinking change decode economics

Reasoning models add internal reasoning streams on top of visible output. Effective capacity planning must split visible tokens from reasoning tokens, then cap runtime via reasoning effort or thinking budget controls.

Interleaved thinking increases scheduler contention because reasoning and tool calls share the same decode loop. Preserved thinking can reduce repeated work across turns, but raises carried-context cost if history policies are too permissive.

→ Decode behavior for reasoning models
Hardware Generation +
B200 with FP4 vs H100 at FP16

Hardware generation creates step-function improvements. B200 with FP4 serves the same model at dramatically higher throughput than H100 at FP16. Blackwell delivers 3-5x performance per dollar vs Hopper.

This drives the technology obsolescence risk in GPU ownership — an H100 purchased today may be economically obsolete before fully depreciated.

The Optimization Stack (15-17x) +
From ~300 to ~5,000 tokens/sec on identical hardware

The same model on the same GPU varies 15-17x in throughput depending on optimization depth. Each layer compounds on the previous:

LayerTechniqueMultiplierCumulative tok/s
BaselineHuggingFace defaults, FP16, static batching1.0x~300
1Continuous batching (dynamic slot mgmt)~2.5x~750
2PagedAttention (KV cache virtualization)~1.7x~1,275
3FlashAttention (fused ops, SRAM tiling)~1.3x~1,658
4Quantization (FP16 → FP8/FP4)~1.8x~2,984
5Custom CUDA kernels (fused, arch-specific)~1.3x~3,879
6Speculative decoding (draft + verify)~1.3x~5,043

Layers 1-3 are table stakes (vLLM, TensorRT-LLM implement them). Layers 4-6 require deep GPU engineering — custom kernels that outperform open-source take years of accumulated expertise. This is the core moat of inference platforms like Fireworks (FireAttention) and Crusoe (MemoryAlloy).

Why the gap persists: Each new model architecture (MoE, multi-modal, new attention patterns) requires new kernel development. The optimization target keeps moving.

Revenue per GPU-Hour +
From $/token to $/GPU-hour — the punchline of optimization

Formula: Revenue/GPU-hr = tok/sec × 3,600 × $/token

Example: Llama 3.1 70B on optimized H100 cluster:

MetricUnoptimizedFully Optimized
Output throughput~200 tok/s~1,200 tok/s
Output price ($0.90/M)$0.65/hr$3.89/hr
Input throughput~2,000 tok/s~8,000 tok/s
Input price ($0.30/M)$2.16/hr$8.64/hr
Blended revenue/GPU-hr$1.25/hr$5.80/hr

The punchline: At Crusoe’s $1.55/hr cost, unoptimized serving barely breaks even ($1.25/hr). Fully optimized yields ~73% gross margin ($5.80/hr). A 6x throughput improvement is a 6x revenue increase at constant pricing.

Revenue per MW & Facility +
Scaling from GPU-hour to facility-level economics

Formula: GPUs/MW = 1,000,000W ÷ server W/GPU slot

GPUW/slotGPUs/MWGPU Rental (85% util)Managed Inference (85%)
H1001,275W~784$12.8M/MW/yr$33.8M/MW/yr
B2001,790W~559$9.1M/MW/yr$24.1M/MW/yr
GB2001,667W~600$9.8M/MW/yr$25.9M/MW/yr

100 MW H100 facility (85% util): GPU rental ~$1.28B/yr. Managed inference ~$3.38B/yr. Industry benchmark: AI facilities generate ~$12.50/watt/year (3x traditional DCs at $4.20/watt). Best operators already exceed $30/watt.

“Revenue per watt” is the master metric for AI data centers. It bridges the infrastructure world to the compute world in a single number.

Target Margins by Business Model +
55-70% managed inference vs 25-35% GPU rental
Business ModelGross MarginWhy
Managed inference (asset-light)55-70%Software optimization on popular models; don’t own infra
GPU rental (asset-heavy)25-35%Capital-intensive; GPU amortization is largest cost
Crusoe managed (rental + software)40-55%MemoryAlloy layer on owned infra; blended product mix
Crusoe energy delta+5-6 pts$0.03 vs $0.07-0.10/kWh flows to margin or pricing

Every dollar shifted from “infrastructure rental revenue” to “software-delivered managed inference” potentially increases valuation multiple. Infrastructure trades at 5-8x revenue; software with recurring revenue at 15-20x.

D3 Pricing Structures Economics +
Per-token, per-GPU-hour, reserved, batch, value-based — each pricing model maps to a different customer segment, margin profile, and strategic intent.
Key Takeaway

Frontier inference cost is declining ~10x annually. Reserved contracts are economically similar to fixed-rate swaps on GPU compute prices.

Think of it like...

Airline pricing — first class, economy, standby, and corporate contracts all sell the same seat-mile at wildly different prices based on flexibility, commitment, and timing.

Customer Segmentation
SegmentVolumePricing ModelMargin
Hobbyist / prototyping<1M tok/dayServerless per-token, free tierLow/negative
Growth startup1-100M tok/dayServerless → on-demandMedium, expanding
Enterprise100M+ tok/dayReserved capacity, customHigh
Batch / offlineLarge, flexibleBatch pricing (50% off)Medium (fills idle)
The Deflation Dynamic

GPT-3.5 equivalent pricing: $20/M tokens (late 2022) → $0.40/M tokens (2025). A 50x decline in ~2.5 years. Reserved contracts lock in today’s price — if costs keep falling, the provider profits on the spread while customers overpay for certainty.

Per-Token (Serverless) +
$/million tokens, split input/output

Standard for API providers. Input/output split is critical: output tokens cost 2-4x more to serve because decode is sequential. Without the split, adverse selection occurs — output-heavy workloads subsidized by input-heavy ones.

Cached input pricing (~50% discount) incentivizes consistent prefixes, improving server-side prefix cache hit rates. This is pricing that shapes behavior to reduce costs.

Model Size TierInput $/MOutput $/M
Under 4B$0.10$0.10
4B – 16B$0.20$0.20
16B – 80B$0.90$0.90
MoE 56B – 176B$0.50$1.20

Fireworks serverless tiers (open-source models). Cached input at 50% discount; batch mode adds another 50% discount for non-real-time workloads. The MoE output premium reflects decode-bound compute despite efficient routing.

→ Rate limits as pricing levers
Reasoning & Fine-Tune Tiers +
Pricing for reasoning tokens and custom adapters

For reasoning models, many teams now separate visible output from reasoning streams (for example reasoning_content) and apply policy caps with effort, budget, and history controls.

Fine-tuned offerings add another pricing layer: base model usage + adapter premium. Fireworks-style LoRA deployment economics (including multi-adapter serving and one-click adapter deployment) allow providers to segment by workflow quality, not just raw token volume.

ModelInput $/MOutput $/MRatio
DeepSeek V3 (671B MoE)$0.56$1.683.0×
DeepSeek R1 (reasoning)$3.00$8.002.7×
GLM-5-0520 (vision)$1.50$3.002.0×
Kimi K2.5 (MoE)$0.48$1.443.0×

Output tokens cost 2–5× input across all models — the ratio reflects decode’s sequential nature. Reasoning models carry the highest absolute rates because extended chain-of-thought generates 5–20× more output tokens per query.

→ Fine-tuning methods and deployment implications
Reserved & Committed +
30-60% discounts for 1-3 year commitments

Revenue predictability for both sides. Reserved contracts are economically similar to fixed-rate swaps: provider receives fixed payments, effectively pays floating (market cost of delivering compute). In a deflationary environment, existing contracts become more valuable.

The real risk isn’t the contract price — it’s what the contract signals about capacity planning. Contracted capacity is hedged; uncontracted capacity is an outright bet on future pricing.

Batch Pricing +
40-50% discount for non-real-time workloads

Like standby airline tickets — fills idle capacity. Customer gives up latency guarantees in exchange for significant discount. Provider gains utilization in off-peak periods.

Batch workloads smooth demand curves: if real-time peaks at 2pm, batch jobs absorb 2am-6am idle capacity. This improves overall fleet utilization from 40-50% to 80-90%.

Strategic Pricing +
Pricing as competitive weapon

Land and expand: Price serverless tier aggressively (even below cost) to acquire developers, monetize at scale when usage grows.

Value-based: Voice agent inference priced per-minute, not per-token. Aligns with customer’s value perception — they think in “minutes of agent time,” not tokens.

Competitive moat: If you have 2x throughput advantage (FireAttention), price 30% below competitors while maintaining better margins. Your optimization is their impossibility.

E
Business Models
Company-level decisions — build, rent, or serve
E1 Managed Inference vs GPU Rental Economics +
Two fundamentally different businesses: selling outcomes (tokens) vs selling infrastructure (GPUs). Statistical multiplexing is the magic of managed inference.
Key Takeaway

Managed inference sells outcomes at 55-70% gross margin via statistical multiplexing. GPU rental sells infrastructure at 34-40% margin with simpler operations but massive CapEx.

Think of it like...

Uber vs Hertz. Uber sells rides (outcomes) and pools cars across thousands of passengers. Hertz rents you the car — you worry about driving. Both make money on vehicles, but the economics are completely different.

Model Comparison
DimensionManaged InferenceGPU Rental
What you sellTokens, completions, minutesRaw GPU-hours
Customer thinks aboutOutcomesInfrastructure
Gross margin55-70%34-40%
Key advantageStatistical multiplexingSimpler operations
CapEx intensityLower (can rent GPUs)Very high
R&D costHigh (serving stack)Moderate (platform)
RiskCorrelated demand spikesUtilization & pricing deflation
Statistical Multiplexing +
The insurance math behind managed inference

Statistical multiplexing — Customer A peaks at 2pm, Customer B at 6pm, Customer C runs batch overnight. Pooling across thousands of customers enables 80-90% GPU utilization, far higher than any single customer achieves.

Like insurance pooling — the law of large numbers applies to token traffic. The risk: correlated demand spikes. When a new model drops or a viral AI demo happens, everyone hits the API simultaneously. This is the “catastrophic event” requiring burst capacity or graceful degradation.

→ How continuous batching enables this
Pricing Design Decisions +
Input/output split, cached pricing, model catalog

Input vs output split: Output tokens cost 2-4x more to serve. Without the split, adverse selection occurs — output-heavy workloads are subsidized.

Cached input pricing: ~50% discount incentivizes consistent prefixes, improving prefix cache hit rates. Pricing that shapes behavior.

Model-specific pricing: Based on cost to serve, competitive pricing, demand elasticity, and strategic value. Popular models subsidize long-tail catalog with sparse traffic.

GPUFireworks On-DemandCrusoe On-DemandCrusoe Spot
H100 SXM$5.49/hr$3.90/hr$1.60/hr
A100 80 GB$3.19/hr$1.95/hr$1.30/hr

The managed premium: Fireworks charges ~40% more than Crusoe on-demand for the same GPU because the rate bundles inference-optimized software (FireAttention, LoRA multiplexing, autoscaling). Crusoe spot at $1.60/hr represents raw infrastructure. The spread ($1.60 → $5.49) is where the entire inference software stack gets monetized.

PM Decisions +
Catalog, rate limits, deprecation, optimization pass-through

1. Model catalog: Which models to add and when — each has hosting cost whether used or not. Warm GPUs for a model nobody calls is pure waste.

2. Rate limits & SLA tiers: Maps to burst capacity reserved per customer. Higher tier = more dedicated headroom = higher price.

3. Model deprecation: Old models eat GPU memory, but enterprise customers depend on them. Migration timelines are diplomatic minefields.

4. Optimization pass-through: When your team ships a 2x throughput improvement, do you pass savings to customers (growth) or keep as margin (profitability)?

Cannibalization Tension +
When managed inference undermines GPU rental

When Crusoe launches Managed Inference alongside GPU rental, they create internal tension. If MemoryAlloy delivers 9.9x better TTFT and 81% cost reduction, why would any inference customer rent raw GPUs?

How MemoryAlloy works: a cluster-wide KV cache that uses RDMA to pool GPU memory across nodes. Instead of each GPU holding its own cache, MemoryAlloy treats the entire cluster’s HBM as a shared resource. This enables instant context reuse across requests — the 9.9× TTFT improvement comes from eliminating redundant prefill when cached KV states exist elsewhere in the cluster. Supported models include Llama 3.1/3.3, DeepSeek R1/V3, and Qwen 2.5.

The resolution: GPU rental increasingly serves training customers and custom workloads. Managed inference captures inference demand at higher margin. Total revenue per GPU potentially goes up — but the PM must model the revenue migration carefully.

Infrastructure Platform Features +
VPC, InfiniBand, managed orchestration, and SLA guarantees

GPU cloud providers differentiate on platform features beyond raw compute. Crusoe’s stack illustrates what enterprise buyers expect:

Networking: InfiniBand for GPU-to-GPU, NVLink within nodes, RDMA for cluster-wide memory access. These determine whether multi-node inference and distributed training are viable.

Orchestration: Managed Kubernetes (K8s) and Slurm for job scheduling. AutoClusters provision multi-node GPU environments in minutes rather than days of manual configuration.

Isolation: VPC (Virtual Private Cloud) for network isolation, per-minute billing granularity, and enterprise security boundaries.

SLAs: 99.98% uptime guarantee, <6 minute support response time. These commitments unlock enterprise procurement cycles that require contractual guarantees.

Hardware roadmap: GB200, B200, and MI355X availability signals to customers that the provider won’t strand them on last-gen hardware — critical for multi-year capacity planning.

Revenue Multiplier: Rental vs Managed +
Managed inference generates 1-7x more revenue per GPU-hour on popular models
ModelGPU Rental Revenue/hrManaged Inference/hr (optimized)Multiplier
Llama 3.1 8B$2.20 (full H100, overkill)$8-15/hr (high throughput, massive volume)4-7x
Llama 3.1 70B$2.20 × 2 GPUs = $4.40$5-10/hr (moderate throughput)1-2x
Llama 3.1 405B$2.20 × 8+ GPUs = $17.60+$8-15/hr (low throughput, premium price)0.5-1x

Managed inference generates more revenue per GPU-hour for popular small-to-mid models (where batching and optimization shine). For very large models (405B+), GPU rental can yield comparable revenue because batch sizes are limited and multi-GPU overhead is high.

The IPO Lens +
Product mix directly affects valuation multiples

CapEx-heavy companies trade at lower revenue multiples. Pure SaaS growing 50% trades at 15-20x revenue. Infrastructure company growing 50% with 60% CapEx intensity trades at 5-8x.

But recurring infrastructure revenue has its own premium. Long-term contracted data center revenue is valued almost like a bond. Digital Realty and Equinix trade at 20-30x AFFO because revenue is contracted and recurring.

Crusoe’s IPO narrative: If positioned as “we spend billions on data centers” → infrastructure multiples. If positioned as “we have $1.5B in contracted cloud ARR growing 5x with managed inference software on top” → blended multiples. Product mix decisions directly affect how the market values the entire company.

E2 Buy vs Rent GPUs Economics +
Breakeven at 44% utilization before WACC, 59% after. Technology obsolescence makes buying a depreciating asset bet; renting buys a monthly call option on compute.
Key Takeaway

Buying a GPU = being long a depreciating asset with uncertain future value. Renting = buying a monthly call option on compute capacity. The optimal strategy is a barbell: own base-load, rent flexibility.

Think of it like...

Electric utilities: own baseload power plants (nuclear, hydro — predictable, cheap per kWh) and buy peaking capacity on the spot market (gas turbines — flexible, expensive per kWh).

Interactive: Breakeven Calculator
Residual Value Scenarios
ScenarioH100 Residual (3yr)ValueImplication
Bull case40%~$11KOwnership strongly favored
Base case20%~$5.5KOwnership favored at high utilization
Bear case5%~$1.4KNext-gen makes it essentially worthless
TCO Breakdown +
Buy-side: $1.10/hr fully loaded vs rent at $2.50/hr

Buy side TCO per GPU-hour: Purchase price amortized over useful life + cost of capital + power/cooling + maintenance (2-5% annual failure rate) − residual/salvage value.

Depreciation tension: H100 straight-line over 3 years = ~$0.95/hr. Over 5 years = ~$0.57/hr. But GPU tech cycles are accelerating: H100 (2022) → H200 (2024) → B200 (2025). If Blackwell delivers 3-5x performance per dollar, H100 depreciating over 5 years is economically obsolete before fully depreciated.

Rent side: ~$2.00-3.00/hr for H100. No upfront CapEx. Flexibility to scale. Obsolescence risk sits with the lessor — but you’re paying their margin.

Cost of Capital Changes Everything +
WACC shifts breakeven from 44% to 59-76%

If blended WACC for GPU purchases is 12%, the $28K H100 has economic cost of $3,360/year in capital charges alone — adding ~$0.38/hr to ownership cost, pushing fully loaded from $1.10 to ~$1.48/hr.

Breakeven utilization moves from 44% to ~59%. If funded with pure equity at 25%, breakeven jumps to ~76%. Interest rate environment matters: 300-400 bps difference on $1B GPU purchase = $30-40M/year in additional interest expense.

Technology Obsolescence +
Option value of renting vs owning

Renting = buying a monthly call option on compute. You pay a premium but maintain optionality to switch hardware, scale down, or pivot. Option value increases when volatility is high (AI hardware changing every 12-18 months), time horizon is uncertain, and interest rates are high.

Buying is attractive when: Demand is highly predictable (Crusoe’s 15-year Abilene lease), structural cost advantage exists (cheap power extends economic life), and the asset can be redeployed across use cases.

→ GPU Memory Hierarchy details
The Barbell Strategy +
Own base-load, rent flexibility

The optimal strategy: heavy ownership of base-load capacity funded by low-cost infrastructure debt secured against long-term contracts, combined with rental/spot capacity for flexibility.

Like utilities: own baseload plants, buy peaking capacity on the spot market. The owned portion provides cost advantage; the rented portion provides optionality. The ratio depends on demand predictability and cost of capital.

E3 Data Center Economics Economics +
$/kW/month is the unit, not $/sqft. Power is the binding constraint — AI racks draw 40-140 kW vs 2-4 kW for legacy, making space essentially free.
Key Takeaway

Crusoe's ~$50/kW/month energy advantage at a 100MW facility = $60M/year in structural cost savings that flow directly to margin or competitive pricing.

Think of it like...

Real estate for aluminum smelters — you don’t price by square footage because the smelter’s value is entirely determined by access to cheap electricity. Same with AI data centers.

Facility Type Ranges
Facility Type$/kW/month
Wholesale colocation (legacy)$80-120
Retail colocation$120-180
AI-optimized facility$150-250+
Hyperscaler self-build$50-80 effective
Crusoe’s Energy Advantage
$0.03
Crusoe $/kWh
$0.10
Grid $/kWh (NoVA)
$50
$/kW/mo Advantage
$60M
Annual Savings (100MW)

Traditional operator at $0.10/kWh: 1 kW continuous for a month = $72 just in electricity. Out of $150/kW/month price, nearly half is power. Crusoe at $0.03/kWh: same 1 kW = $22/month.

Crusoe Facility Portfolio
LocationCapacityEnergy Source
Abilene, TX1.2 GWGrid + renewables
IcelandExpandingGeothermal (near-zero carbon)
WyomingOperatingStranded natural gas
NorwayPlannedHydroelectric
ArgentinaPlannedStranded gas (Vaca Muerta)

Crusoe’s Abilene campus alone at 1.2 GW would rank among the largest single-site data center deployments globally. The geographic diversification across clean/stranded energy sources hedges against regional power regulation and positions the company for enterprise customers with carbon-reduction mandates. A nuclear energy partnership with Blue Owl Capital adds baseload diversification beyond renewables.

Power as Binding Constraint +
AI racks draw 30-70x more power than legacy

Legacy server rack: 2-4 kW. AI rack with 8 H100s: 40-70 kW. GB200 NVL72 rack: 120-140 kW. That’s 30-70x more power but roughly the same physical footprint. Pricing by square footage breaks completely.

Power delivery infrastructure alone costs millions: transformers ($1-5M each), switchgear, UPS, backup generators, distribution. Redundancy requirements (N+1 or 2N) mean 100 kW provisioned → 200 kW built.

Cooling at Scale +
Every watt consumed = a watt of heat to remove

Every watt consumed becomes a watt of heat to remove. At 140 kW/rack, air cooling physically cannot keep up — liquid cooling is required. Cost scales linearly with kW.

PUE captures this overhead. PUE of 1.1 (liquid) means 10% cooling overhead. PUE of 1.4 (air) means 40%. The difference at 100MW: an extra 30MW consumed just for cooling, or ~$15M/year at grid rates.

What $/kW/month Bundles +
Power, cooling, redundancy — space is essentially free

The $/kW/month price bundles: power delivery infrastructure (transformers, switchgear, UPS, generators, distribution), cooling capacity (linearly proportional to power), redundancy (N+1 or 2N), and physical space (essentially free at this point — tiny fraction of total cost).

This is why the metric is $/kW/month: it captures the actual scarce resource (power capacity) rather than the abundant one (floor space).

CapEx Breakdown by Component +
$35-45M/MW all-in — GPUs are 55-65% of total
ComponentIndustry $/MWCrusoe $/MWAdvantage?
Shell & core construction$9-15M$8-12MModest (cheap land, modular build)
AI fit-out (GPUs, networking)$20-25M$20-25MNone — GPU cost is GPU cost
Total all-in per MW$35-45M$28-37M

Shell & core components: Land ($100-500/kW, 1-3% of total), power infrastructure ($2-4.5K/kW, 20-30% — Crusoe saves on grid interconnection via behind-the-meter), cooling ($1.5-3K/kW, 15-20% — cold-climate sites save on heat rejection), building ($600-1.2K/kW, 5-10%), networking ($350-900/kW, 3-7%).

GPU depreciation mismatch is the biggest financial risk. Accounting: 3-5 years. Economic life: 2-3 years. Crusoe’s cheap power extends viability of older GPUs for batch workloads — converting a liability (aging GPUs) into an asset (cheap compute).

OpEx Breakdown & Crusoe Advantage +
$50-80M/year savings at 100 MW — almost all electricity & cooling
ComponentIndustry $/MW/yrCrusoe $/MW/yrAdvantage?
Electricity$675K-1,050K$190-440KYes — structural
Cooling operations$150-400K$50-150KYes — partial
Personnel$200-400K$200-450KNo — possibly higher (rural premiums)
Hardware maintenance$150-350K$150-350KNo — GPU failure rates are physics
Networking & connectivity$50-120K$60-150KNo — dark fiber to rural costs more
Other (insurance, software, etc.)$80-280K$80-280KNo
Total OpEx$1.3-2.7M$810K-1.9M

Critical honesty: The $50-80M/year advantage at 100 MW is almost entirely electricity and cooling (~$48-75M of total). Personnel, maintenance, networking, and compliance run at industry rates or slightly higher due to remote locations. The energy advantage is large enough to dominate total OpEx regardless.

The math that defines the moat: 1 kW × 730 hrs/mo at $0.10/kWh = $73/kW/mo. At $0.03/kWh = $22/kW/mo. Delta: $51/kW/mo = $612/kW/yr. At 1.2 GW (Abilene full scale): $735M/year structural advantage.

Colocation Lease Benchmarks +
$120-450/kW/month by market — seller’s market
Market$/kW/month (2025)Notes
US wholesale average$163-184CBRE H1 2025: $184/kW/mo
Northern Virginia (Ashburn)$200-215+Record highs, <2% vacancy
Silicon Valley$200-250+Highest US cost market
Phoenix~$190Rapid growth, 1.7% vacancy
Dallas/Texas$140-170Business-friendly regulatory
Atlanta$120-150Cheapest major US market
AI-optimized high-density$180-300+Premium for liquid cooling, 100+ kW/rack
Iceland/Nordics$80-130Cheap hydro/geothermal; higher connectivity
Singapore$350-450+Most expensive globally

Key trend: Nearly 75% of the 5,242 MW under construction in North America is pre-leased. Some commitments extend to capacity not delivering until 2027+. This is a seller’s market — every MW of operational capacity commands premium pricing.

Crusoe as competitor: Energy advantage means Crusoe can price at $100-140/kW/mo and still achieve superior margins vs competitors at $160-200/kW/mo.

100 MW Reference Model +
Total economics for a facility at scale
CategoryIndustry All-InCrusoe Estimated
Total CapEx$3.0-4.0B$2.8-3.7B
  Shell & core$900M-1.5B$800M-1.2B
  AI fit-out (GPUs, networking)$2.0-2.5B$2.0-2.5B
Annual OpEx$130-250M$75-175M
  Electricity$67-105M$19-44M
  Cooling$15-40M$5-15M
  Personnel$20-40M$20-45M
  Hardware maintenance$15-35M$15-35M
Revenue (85% util)$200-350M$200-350M
Target Gross Margin35-50%40-55%

The CapEx savings ($100-300M) are real but limited to shell & core — NVIDIA prices GPUs the same for everyone. The margin improvement (5-10 points) comes from OpEx: $50-80M/year in electricity and cooling savings on $200-350M revenue. Over a 15-year facility life, ~$65M/year average advantage compounds to nearly $1B in cumulative savings.

F
Capital Structure
How it all gets financed — debt, equity, and the cost of capital
F1 Why Equity Costs More Than Debt Economics +
Debt holders get paid first, accept lower returns. Equity holders are last in line with uncapped upside and uncapped downside — demanding much higher required returns.
Key Takeaway

Equity costs 3-4x more than debt because equity holders bear residual risk with no contractual protection. The tax shield on debt interest makes the gap even wider.

Think of it like...

A building with floors. Debt holders live on the ground floor — if the building sinks, they’re last to get wet. Equity holders live in the penthouse — great view, but first to feel any earthquake. Higher floor = higher risk = higher rent.

The Risk Hierarchy
DimensionDebtEquity
Priority in liquidationFirst (senior)Last (residual)
Cash flowsContractual (fixed interest)Residual (dividends optional)
UpsideCapped at interest rateUncapped
DownsideProtected by covenantsCan lose everything
Tax treatmentInterest is deductibleReturns are not
Typical cost7-10%20-30%+
Tax Shield +
Interest deductibility widens the cost gap

Interest payments are tax-deductible. At 21% corporate tax rate, 8% interest → ~6.3% after-tax cost. Equity returns have no such benefit.

On $5B of debt at 8%, the tax shield is worth $84M/year — pure value creation from choosing debt over equity for the same investment.

Contractual vs Residual Claims +
Debt holders price for downside protection

Debt holder’s max upside is the interest rate (8%). They price for downside protection via covenants, collateral, and priority. Equity investors face uncapped upside and uncapped downside — they need much higher expected returns to justify the risk.

Information asymmetry compounds this: debt holders protect themselves with covenants, while equity holders have board seats but can’t contractually force profitability. More trust required = riskier = more expensive.

Why Not All Debt? +
Financial distress, capacity limits, rising marginal cost

If debt is cheaper, why not fund everything with debt? Three limits:

1. Financial distress risk: Too much leverage makes a temporary downturn existential. Missing one interest payment can trigger default cascades.

2. Debt capacity limits: Lenders won’t fund beyond cash flow coverage. The DSCR sets a hard ceiling.

3. Rising marginal cost: First billion at 8%, fifth billion at 12%. At some point, additional debt costs more than equity.

F2 Contracted Revenue & Lenders Economics +
Not collateral — cash flow underwriting. Contracted revenue answers the question lenders care about most: “How likely is it that you default?” via DSCR mechanics.
Key Takeaway

A 40% commitment discount looks expensive in isolation, but the contracted revenue it creates can unlock debt capacity whose NPV far exceeds the discount given. Pricing decisions directly affect capital structure.

Think of it like...

Getting a mortgage. The bank doesn’t just look at the house (collateral) — they look at your salary and employment contract (contracted revenue). A tenured professor with $100K salary gets a bigger mortgage at a lower rate than a freelancer earning $150K with no contract.

The Four Mechanisms
3-4x
Higher Debt Capacity
150bps
Lower Interest Rate
2-3x
Longer Tenor
$75M
Annual Interest Savings ($5B)
Higher Debt Capacity +
DSCR math: same facility, dramatically different borrowing

Lenders size loans using DSCR — want cash flow at 1.3-2.0x annual debt service.

Without contracted revenue: Lender underwrites to conservative $300M → $150M FCF → supports ~$100M/year debt service → ~$1-1.5B total debt capacity.

With 15-year Oracle contract at $600M/year: Lender underwrites to $600M → $400M FCF → supports ~$267M/year debt service → ~$3-4B total debt capacity. Same facility, dramatically different borrowing.

Lower Interest & Looser Covenants +
SOFR + 400bps → SOFR + 250bps

Contracted revenue with investment-grade counterparty compresses spread: SOFR + 400bps → SOFR + 250bps. On $5B = $75M/year in interest savings.

Looser covenants: fewer restrictions on additional borrowing, less stringent maintenance ratios, more operational flexibility. Longer tenor: 15-year contracted revenue supports 10-12 year debt vs 3-5 years for uncontracted.

Assignment of Contracts +
Quasi-collateral: lenders step into payment streams

Lenders structure assignment of contracts as security interest. If Crusoe defaults, JPMorgan could step into Crusoe’s position and receive Oracle’s lease payments directly.

Like mortgage-backed securities — the payment streams themselves are the security. Not the physical assets, but the right to receive contracted cash flows.

Longer Tenor = Competitive Weapon +
$127M/year less debt service on $1B — 12yr vs 5yr

15-year contracted revenue supports 10-12 year debt. Uncontracted supports only 3-5 years. The math on identical $1B borrowed:

Metric5-year at 8%12-year at 7%Difference
Annual principal$200M~$83M
Year 1 interest~$80M~$70M
Total Year 1 debt service~$280M~$153M$127M freed
DSCR on $400M FCF1.43x (tight)2.61x (comfortable)
Revenue decline before breach9%45%

$127M/year less in debt service to reinvest, build capacity, or weather downturns. A competitor with shorter-tenor debt has higher fixed obligations and less room to cut prices during a price war. Debt structure becomes a competitive weapon.

The Abilene example: $9.6B JPMorgan facility secured against 15-year Oracle/Stargate lease. On 5-year debt, annual principal alone = ~$1.92B vs ~$800M on 12-year. The difference ($1.1B/year) would make the project borderline unfinanceable.

The Virtuous Cycle +
Contracts → cheaper debt → lower WACC → more contracts

Every 1-year or 3-year GPU reservation contract signed makes the debt package more attractive to lenders → cheaper debt → lower WACC → more competitive pricing → more customers → more contracts → even more debt capacity.

This is why commitment discounts aren’t just about revenue: they’re a capital structure optimization. The PM must model the full-cycle NPV, not just the direct pricing impact.

Reserved pricing on owned infrastructure: At Crusoe’s $1.55/hr cost, a 40% reserved discount yields $1.32/hr — a -15% GPU-hour margin. But the contracted revenue enables debt at SOFR + 250bps instead of + 400bps and extends tenor from 5yr to 12yr. The capital structure benefit (lower WACC, $127M/yr less debt service per $1B) often exceeds the margin given away. The correct analysis is NPV of the entire capital structure impact, not P&L on the individual GPU-hour.

F3 Debt vs Equity Across Stages Economics +
From Series A (95-100% equity) to public company (optimized mix). As cash flows become more predictable, the optimal debt/equity ratio shifts dramatically.
Key Takeaway

Debt requires predictability. Equity tolerates uncertainty. Crusoe runs two capital structures in one company: project-finance-funded infrastructure + venture-funded software.

Think of it like...

A person’s financial life. College student = 100% “equity” (parental support, scholarships). First job = some debt (car loan). Established career = mortgage, credit lines. Retired = optimized portfolio. More predictable income unlocks more leverage.

Interactive: Capital Structure by Stage
Series A → Series B +
95-100% equity → 80-90% equity

Series A: No revenue, no assets, no track record. Might not exist in 18 months. Cost of equity: 50-100%+ implied (VCs need 100x potential). Debt is unavailable. Mix: 95-100% equity. Exception: venture debt (20-30% of last equity round) with warrants.

Series B: $5-20M ARR, real product, paying customers. Cost of equity: 30-50%. Cost of debt: 12-15% (venture debt with warrants). Mix: 80-90% equity. Equipment financing becomes possible (80% LTV on GPU purchases).

Series C/D → Pre-IPO +
60-80% equity → 40-60% equity

Series C/D: $50-200M+ ARR, proven unit economics, enterprise contracts. Cost of equity: 15-25%. Cost of debt: 8-12% (term loans, asset-backed). Mix: 60-80% equity. CoreWeave pioneered billions in GPU-backed debt at this stage.

Pre-IPO: $200M-1B+ revenue, clear profitability path. Cost of equity: 12-20%. Cost of debt: 6-9%. Mix: 40-60% equity. Convertible notes popular: lower interest (2-5%) with embedded call option. Crusoe is here now (Series E, $10B+).

Public Company +
Full optimization — cost of equity drops dramatically

Cost of equity drops dramatically: 10-15% (liquidity, transparency, diversifiability). Cost of debt: 4-7% (investment-grade bonds, commercial paper). Gap narrows but never closes — debt holders are always paid first.

Active capital management: share buybacks, debt-funded buybacks, dividend policy, credit rating management. The CFO becomes a portfolio manager optimizing the capital structure continuously.

Crusoe’s Dual Structure +
Two capital structures in one company

Crusoe runs two capital structures simultaneously:

Project-finance infrastructure: Abilene’s $9.6B debt + $5B equity. Low cost of capital, asset-heavy, contracted cash flows. Like a utility or pipeline company.

Venture-funded software: Cloud platform, managed inference. High cost of capital, asset-light, uncertain returns. Like a typical tech startup.

The PM must understand which investments belong to which bucket. Managed inference feature ($20M) → equity-funded, needs venture-scale returns. Data center with signed contract → leverage cheap debt, lower hurdle rate.

CapEx in Supply-Constrained Markets +
Operational facilities worth more than construction cost

The default mental model is wrong right now. Normally: spend $100 on an asset, it depreciates, worth less than $100 immediately. AI data center infrastructure breaks this because of extreme supply-demand imbalance.

Supply side — a cascade of sequential bottlenecks:

1. Power interconnection (3-5 year queue) — can’t buy your way to the front. 2. Permitting (12-18 months) — single legal challenge adds a year. 3. Skilled labor (fully booked) — 140 kW/rack facilities need specialized contractors. 4. Equipment (12-24 months) — switchgear up 50%, generators up 45% since 2021. 5. GPUs (allocation-constrained).

Total: 2-4 years from “decide to build” to “serving customers.” A completed facility represents the crystallized result of navigating every bottleneck. The real estate parallel: a house costs $400K to build but sells for $1.2M — the delta is the embedded scarcity of entitled, permitted, connected land.

When this reverses: Supply catches up (hyperscaler buildouts complete), demand plateaus (AI scaling hits diminishing returns), or technology shifts (inference efficiency, on-device inference). Crusoe’s energy advantage persists even if scarcity fades — cheap power remains valuable regardless.

The Barbell Approach +
Go heavy on both extremes, avoid the expensive middle

Core principle: Commit heavily where certainty is high. Maintain cheap optionality where it isn’t. Avoid the expensive middle of moderate commitment with moderate flexibility.

DomainOwned Extreme (~80%)Flex Extreme (~20%)Middle to Avoid
GPU FleetOwned GPUs for contracted customers; equipment-backed debtRented spot/on-demand for spikesBuying on speculation without revenue certainty
Data CentersOwned facilities with 15yr leases (Abilene)Leased colo in new markets to validate demandBuilding 500 MW with only 100 MW contracted
PowerStranded gas/hydro/geothermal; long-term PPAsSpot grid for peaksBuilding 200 MW turbine for uncertain demand
Model CatalogDeep optimization on top 10-15 models (80%+ revenue)vLLM defaults on 80+ long-tail modelsModerate optimization across 50 models
CustomersEnterprise ($500K-5M+ contracts, high touch)Self-serve developers (near-zero CAC)Mid-market ($50-200K, significant sales effort, poor LTV/CAC)
PricingAggressive serverless (acquisition, below cost)Premium enterprise (40-60% margin)Moderate pricing that attracts neither segment

Why barbell works in AI: Extreme demand uncertainty (single model release shifts demand by orders of magnitude), rapid technology obsolescence (GPU generations every 18-24 months), and power law economics (small number of models/customers drive vast majority of revenue). Invest heavily in winners, maintain cheap options on everything else.

→ Foundation model funding & mega-rounds