Economics of LLM Inference

D1 The Cost Stack Economics +

Bottom-up cost per GPU-hour: hardware amortization, power, networking, data center, and operations. CapEx vs OpEx drives everything.

Key Takeaway

Cost per million tokens = Total hourly GPU cost ÷ tokens served per hour. Throughput optimization is literally margin expansion.

Think of it like...

A restaurant's cost per meal — ingredients, rent, labor, utilities. Two restaurants with identical kitchens can have wildly different costs per plate based on how many covers they turn.

Cost Comparison

Component	Crusoe (owns infra)	CoreWeave (leases some)
GPU CapEx amortized (3yr)	~$0.95/hr	~$1.10/hr
Power	~$0.03/hr	~$0.07/hr
Data center (amortized)	~$0.15/hr	~$0.25/hr
Networking	~$0.10/hr	~$0.10/hr
Operations/platform	~$0.10/hr	~$0.12/hr
Total cost per H100-hr	~$1.33/hr	~$1.64/hr
Selling price	$2.20/hr	$2.20/hr
Gross Margin	~40%	~34%

Interactive: Cost Waterfall

Hardware / Compute +

GPU purchase & amortization — the largest line item

GPU purchase: H100 SXM ~$25-30K, B200 ~$30-40K. Purchased GPUs amortize over 3-5 years — cheaper per hour if utilization is high, but depreciation risk from technology obsolescence. Rented GPUs are OpEx at ~$2.00-3.00/hr mid-market (down from $7-8/hr peak).

GPU failure rate: ~2-5% annually, requiring spare buffer inventory. At 10,000 GPUs, expect 200-500 failures per year.

Power & Cooling +

PUE, electricity rates, and Crusoe’s structural advantage

Power cost varies wildly: $0.03-0.05/kWh (Crusoe’s stranded energy) vs $0.08-0.12/kWh (grid in Northern Virginia). A single H100 draws ~700W under load. At $0.10/kWh = $0.07/hr per GPU. At 10,000 GPUs = $6.1M/year in electricity.

PUE (Power Usage Effectiveness): 1.1 means 10% cooling overhead; 1.4 means 40%. Liquid cooling pushes closer to 1.1, air cooling sits at 1.3-1.5. Every 0.1 improvement in PUE across a 100MW facility saves ~$600K/year.

Networking & Storage +

InfiniBand adds 15-25% to total cluster cost

InfiniBand for multi-GPU inference adds 15-25% to total cluster cost. For a 1,000 H100 cluster: $5-15M in NICs, switches, and cabling. NVIDIA/Mellanox near-monopoly limits price negotiation.

Storage for model weights, KV cache spill, logging: ~5-10% of compute cost.

→ How InfiniBand works in the pipeline → Training hardware costs comparison

Operations +

SRE/DevOps staff, monitoring, on-call

Operational costs include SRE/DevOps staff, monitoring infrastructure, on-call rotations, and platform orchestration (Kubernetes, Slurm, auto-node-replacement).

GPU failures at 2-5% annually mean a 10,000-GPU fleet needs constant triage — detecting degraded GPUs, migrating workloads, RMA processing. This is where OpEx quietly compounds.

D2 Throughput as Margin Economics +

Same GPU, same model — wildly different costs. Optimization level creates 3-5x throughput variance, which is why prices vary 10x+ across providers.

Key Takeaway

Throughput optimization IS margin expansion. If custom CUDA kernels serve 2x the tokens/sec on the same GPU, cost per token halves.

Think of it like...

Two factories with identical machines. One runs at 30% capacity with long changeover times; the other runs at 85% with quick changeovers. Same capital cost, dramatically different unit economics.

The Formula

Cost per million tokens = (Total hourly GPU cost) ÷ (tokens served per hour)

Numerator is largely fixed (hardware + power + overhead)
Denominator varies 3-5x based on optimization level

→ Same hardware, 3-5x cost variance

What Drives the Denominator

3-5x

Throughput Variance

10x+

Price Variance

80-90%

Target Utilization

Model Size +

8B is ~10x cheaper per token than 405B

Serving an 8B model is ~10x cheaper per token than 405B. The 405B model needs 8-16 GPUs (vs 1), performs 50x more matrix operations per token, and the larger KV cache per token reduces batch slots from 64+ to 8-16.

Factor	8B	405B	Penalty
Weight data per token	~16 GB	~810 GB	~50x
GPUs required	1	8-16	8-16x cost
Communication overhead	Zero	126 all-reduce ops	Pure penalty
Batch size	64+	8-16	4-8x less throughput
Cost / M tokens	$0.03-0.05	$0.50-1.00	10-20x

Optimization Level +

Naive vs fully optimized — same GPU, 3-5x throughput difference

The same model on the same GPU can have 3-5x throughput variance between naive and fully optimized serving. This includes custom attention kernels (FlashAttention, PagedAttention), quantization (FP16 → INT4), continuous batching, speculative decoding, and prefix caching.

This is why inference platform companies like Fireworks exist — their serving infrastructure (FireAttention, MemoryAlloy) extracts dramatically more tokens/sec from the same hardware.

→ Performance Optimization Stack details

Batching & Utilization +

GPU serving one request at a time wastes most capacity

A GPU serving one request at a time wastes most of its compute capacity. Continuous batching enables 80-90% utilization by adding new requests to an in-flight batch as slots free up.

Request characteristics matter: long context uses more KV cache memory, reducing batch slots. Output tokens cost more than input tokens because decode is sequential while prefill is parallel.

→ How continuous batching works

Reasoning-Mode Throughput +

Interleaved and preserved thinking change decode economics

Reasoning models add internal reasoning streams on top of visible output. Effective capacity planning must split visible tokens from reasoning tokens, then cap runtime via reasoning effort or thinking budget controls.

Interleaved thinking increases scheduler contention because reasoning and tool calls share the same decode loop. Preserved thinking can reduce repeated work across turns, but raises carried-context cost if history policies are too permissive.

→ Decode behavior for reasoning models

Hardware Generation +

B200 with FP4 vs H100 at FP16

Hardware generation creates step-function improvements. B200 with FP4 serves the same model at dramatically higher throughput than H100 at FP16. Blackwell delivers 3-5x performance per dollar vs Hopper.

This drives the technology obsolescence risk in GPU ownership — an H100 purchased today may be economically obsolete before fully depreciated.

D3 Pricing Structures Economics +

Per-token, per-GPU-hour, reserved, batch, value-based — each pricing model maps to a different customer segment, margin profile, and strategic intent.

Key Takeaway

Frontier inference cost is declining ~10x annually. Reserved contracts are economically similar to fixed-rate swaps on GPU compute prices.

Think of it like...

Airline pricing — first class, economy, standby, and corporate contracts all sell the same seat-mile at wildly different prices based on flexibility, commitment, and timing.

Customer Segmentation

Segment	Volume	Pricing Model	Margin
Hobbyist / prototyping	<1M tok/day	Serverless per-token, free tier	Low/negative
Growth startup	1-100M tok/day	Serverless → on-demand	Medium, expanding
Enterprise	100M+ tok/day	Reserved capacity, custom	High
Batch / offline	Large, flexible	Batch pricing (50% off)	Medium (fills idle)

The Deflation Dynamic

GPT-3.5 equivalent pricing: $20/M tokens (late 2022) → $0.40/M tokens (2025). A 50x decline in ~2.5 years. Reserved contracts lock in today’s price — if costs keep falling, the provider profits on the spread while customers overpay for certainty.

Per-Token (Serverless) +

$/million tokens, split input/output

Standard for API providers. Input/output split is critical: output tokens cost 2-4x more to serve because decode is sequential. Without the split, adverse selection occurs — output-heavy workloads subsidized by input-heavy ones.

Cached input pricing (~50% discount) incentivizes consistent prefixes, improving server-side prefix cache hit rates. This is pricing that shapes behavior to reduce costs.

→ Rate limits as pricing levers

Reasoning & Fine-Tune Tiers +

Pricing for reasoning tokens and custom adapters

For reasoning models, many teams now separate visible output from reasoning streams (for example reasoning_content) and apply policy caps with effort, budget, and history controls.

Fine-tuned offerings add another pricing layer: base model usage + adapter premium. Fireworks-style LoRA deployment economics (including multi-adapter serving and one-click adapter deployment) allow providers to segment by workflow quality, not just raw token volume.

→ Fine-tuning methods and deployment implications

Reserved & Committed +

30-60% discounts for 1-3 year commitments

Revenue predictability for both sides. Reserved contracts are economically similar to fixed-rate swaps: provider receives fixed payments, effectively pays floating (market cost of delivering compute). In a deflationary environment, existing contracts become more valuable.

The real risk isn’t the contract price — it’s what the contract signals about capacity planning. Contracted capacity is hedged; uncontracted capacity is an outright bet on future pricing.

Batch Pricing +

40-50% discount for non-real-time workloads

Like standby airline tickets — fills idle capacity. Customer gives up latency guarantees in exchange for significant discount. Provider gains utilization in off-peak periods.

Batch workloads smooth demand curves: if real-time peaks at 2pm, batch jobs absorb 2am-6am idle capacity. This improves overall fleet utilization from 40-50% to 80-90%.

Strategic Pricing +

Pricing as competitive weapon

Land and expand: Price serverless tier aggressively (even below cost) to acquire developers, monetize at scale when usage grows.

Value-based: Voice agent inference priced per-minute, not per-token. Aligns with customer’s value perception — they think in “minutes of agent time,” not tokens.

Competitive moat: If you have 2x throughput advantage (FireAttention), price 30% below competitors while maintaining better margins. Your optimization is their impossibility.

E1 Managed Inference vs GPU Rental Economics +

Two fundamentally different businesses: selling outcomes (tokens) vs selling infrastructure (GPUs). Statistical multiplexing is the magic of managed inference.

Key Takeaway

Managed inference sells outcomes at 55-70% gross margin via statistical multiplexing. GPU rental sells infrastructure at 34-40% margin with simpler operations but massive CapEx.

Think of it like...

Uber vs Hertz. Uber sells rides (outcomes) and pools cars across thousands of passengers. Hertz rents you the car — you worry about driving. Both make money on vehicles, but the economics are completely different.

Model Comparison

Dimension	Managed Inference	GPU Rental
What you sell	Tokens, completions, minutes	Raw GPU-hours
Customer thinks about	Outcomes	Infrastructure
Gross margin	55-70%	34-40%
Key advantage	Statistical multiplexing	Simpler operations
CapEx intensity	Lower (can rent GPUs)	Very high
R&D cost	High (serving stack)	Moderate (platform)
Risk	Correlated demand spikes	Utilization & pricing deflation

Statistical Multiplexing +

The insurance math behind managed inference

Statistical multiplexing — Customer A peaks at 2pm, Customer B at 6pm, Customer C runs batch overnight. Pooling across thousands of customers enables 80-90% GPU utilization, far higher than any single customer achieves.

Like insurance pooling — the law of large numbers applies to token traffic. The risk: correlated demand spikes. When a new model drops or a viral AI demo happens, everyone hits the API simultaneously. This is the “catastrophic event” requiring burst capacity or graceful degradation.

→ How continuous batching enables this

Pricing Design Decisions +

Input/output split, cached pricing, model catalog

Input vs output split: Output tokens cost 2-4x more to serve. Without the split, adverse selection occurs — output-heavy workloads are subsidized.

Cached input pricing: ~50% discount incentivizes consistent prefixes, improving prefix cache hit rates. Pricing that shapes behavior.

Model-specific pricing: Based on cost to serve, competitive pricing, demand elasticity, and strategic value. Popular models subsidize long-tail catalog with sparse traffic.

PM Decisions +

Catalog, rate limits, deprecation, optimization pass-through

1. Model catalog: Which models to add and when — each has hosting cost whether used or not. Warm GPUs for a model nobody calls is pure waste.

2. Rate limits & SLA tiers: Maps to burst capacity reserved per customer. Higher tier = more dedicated headroom = higher price.

3. Model deprecation: Old models eat GPU memory, but enterprise customers depend on them. Migration timelines are diplomatic minefields.

4. Optimization pass-through: When your team ships a 2x throughput improvement, do you pass savings to customers (growth) or keep as margin (profitability)?

Cannibalization Tension +

When managed inference undermines GPU rental

When Crusoe launches Managed Inference alongside GPU rental, they create internal tension. If MemoryAlloy delivers 9.9x better TTFT and 81% cost reduction, why would any inference customer rent raw GPUs?

The resolution: GPU rental increasingly serves training customers and custom workloads. Managed inference captures inference demand at higher margin. Total revenue per GPU potentially goes up — but the PM must model the revenue migration carefully.

E2 Buy vs Rent GPUs Economics +

Breakeven at 44% utilization before WACC, 59% after. Technology obsolescence makes buying a depreciating asset bet; renting buys a monthly call option on compute.

Key Takeaway

Buying a GPU = being long a depreciating asset with uncertain future value. Renting = buying a monthly call option on compute capacity. The optimal strategy is a barbell: own base-load, rent flexibility.

Think of it like...

Electric utilities: own baseload power plants (nuclear, hydro — predictable, cheap per kWh) and buy peaking capacity on the spot market (gas turbines — flexible, expensive per kWh).

Interactive: Breakeven Calculator

Residual Value Scenarios

Scenario	H100 Residual (3yr)	Value	Implication
Bull case	40%	~$11K	Ownership strongly favored
Base case	20%	~$5.5K	Ownership favored at high utilization
Bear case	5%	~$1.4K	Next-gen makes it essentially worthless

TCO Breakdown +

Buy-side: $1.10/hr fully loaded vs rent at $2.50/hr

Buy side TCO per GPU-hour: Purchase price amortized over useful life + cost of capital + power/cooling + maintenance (2-5% annual failure rate) − residual/salvage value.

Depreciation tension: H100 straight-line over 3 years = ~$0.95/hr. Over 5 years = ~$0.57/hr. But GPU tech cycles are accelerating: H100 (2022) → H200 (2024) → B200 (2025). If Blackwell delivers 3-5x performance per dollar, H100 depreciating over 5 years is economically obsolete before fully depreciated.

Rent side: ~$2.00-3.00/hr for H100. No upfront CapEx. Flexibility to scale. Obsolescence risk sits with the lessor — but you’re paying their margin.

Cost of Capital Changes Everything +

WACC shifts breakeven from 44% to 59-76%

If blended WACC for GPU purchases is 12%, the $28K H100 has economic cost of $3,360/year in capital charges alone — adding ~$0.38/hr to ownership cost, pushing fully loaded from $1.10 to ~$1.48/hr.

Breakeven utilization moves from 44% to ~59%. If funded with pure equity at 25%, breakeven jumps to ~76%. Interest rate environment matters: 300-400 bps difference on $1B GPU purchase = $30-40M/year in additional interest expense.

Technology Obsolescence +

Option value of renting vs owning

Renting = buying a monthly call option on compute. You pay a premium but maintain optionality to switch hardware, scale down, or pivot. Option value increases when volatility is high (AI hardware changing every 12-18 months), time horizon is uncertain, and interest rates are high.

Buying is attractive when: Demand is highly predictable (Crusoe’s 15-year Abilene lease), structural cost advantage exists (cheap power extends economic life), and the asset can be redeployed across use cases.

→ GPU Memory Hierarchy details

The Barbell Strategy +

Own base-load, rent flexibility

The optimal strategy: heavy ownership of base-load capacity funded by low-cost infrastructure debt secured against long-term contracts, combined with rental/spot capacity for flexibility.

Like utilities: own baseload plants, buy peaking capacity on the spot market. The owned portion provides cost advantage; the rented portion provides optionality. The ratio depends on demand predictability and cost of capital.

E3 Data Center Economics Economics +

$/kW/month is the unit, not $/sqft. Power is the binding constraint — AI racks draw 40-140 kW vs 2-4 kW for legacy, making space essentially free.

Key Takeaway

Crusoe's ~$50/kW/month energy advantage at a 100MW facility = $60M/year in structural cost savings that flow directly to margin or competitive pricing.

Think of it like...

Real estate for aluminum smelters — you don’t price by square footage because the smelter’s value is entirely determined by access to cheap electricity. Same with AI data centers.

Facility Type Ranges

Facility Type	$/kW/month
Wholesale colocation (legacy)	$80-120
Retail colocation	$120-180
AI-optimized facility	$150-250+
Hyperscaler self-build	$50-80 effective

Crusoe’s Energy Advantage

$0.03

Crusoe $/kWh

$0.10

Grid $/kWh (NoVA)

$50

$/kW/mo Advantage

$60M

Annual Savings (100MW)

Traditional operator at $0.10/kWh: 1 kW continuous for a month = $72 just in electricity. Out of $150/kW/month price, nearly half is power. Crusoe at $0.03/kWh: same 1 kW = $22/month.

Power as Binding Constraint +

AI racks draw 30-70x more power than legacy

Legacy server rack: 2-4 kW. AI rack with 8 H100s: 40-70 kW. GB200 NVL72 rack: 120-140 kW. That’s 30-70x more power but roughly the same physical footprint. Pricing by square footage breaks completely.

Power delivery infrastructure alone costs millions: transformers ($1-5M each), switchgear, UPS, backup generators, distribution. Redundancy requirements (N+1 or 2N) mean 100 kW provisioned → 200 kW built.

Cooling at Scale +

Every watt consumed = a watt of heat to remove

Every watt consumed becomes a watt of heat to remove. At 140 kW/rack, air cooling physically cannot keep up — liquid cooling is required. Cost scales linearly with kW.

PUE captures this overhead. PUE of 1.1 (liquid) means 10% cooling overhead. PUE of 1.4 (air) means 40%. The difference at 100MW: an extra 30MW consumed just for cooling, or ~$15M/year at grid rates.

What $/kW/month Bundles +

Power, cooling, redundancy — space is essentially free

The $/kW/month price bundles: power delivery infrastructure (transformers, switchgear, UPS, generators, distribution), cooling capacity (linearly proportional to power), redundancy (N+1 or 2N), and physical space (essentially free at this point — tiny fraction of total cost).

This is why the metric is $/kW/month: it captures the actual scarce resource (power capacity) rather than the abundant one (floor space).

F1 Why Equity Costs More Than Debt Economics +

Debt holders get paid first, accept lower returns. Equity holders are last in line with uncapped upside and uncapped downside — demanding much higher required returns.

Key Takeaway

Equity costs 3-4x more than debt because equity holders bear residual risk with no contractual protection. The tax shield on debt interest makes the gap even wider.

Think of it like...

A building with floors. Debt holders live on the ground floor — if the building sinks, they’re last to get wet. Equity holders live in the penthouse — great view, but first to feel any earthquake. Higher floor = higher risk = higher rent.

The Risk Hierarchy

Dimension	Debt	Equity
Priority in liquidation	First (senior)	Last (residual)
Cash flows	Contractual (fixed interest)	Residual (dividends optional)
Upside	Capped at interest rate	Uncapped
Downside	Protected by covenants	Can lose everything
Tax treatment	Interest is deductible	Returns are not
Typical cost	7-10%	20-30%+

Tax Shield +

Interest deductibility widens the cost gap

Interest payments are tax-deductible. At 21% corporate tax rate, 8% interest → ~6.3% after-tax cost. Equity returns have no such benefit.

On $5B of debt at 8%, the tax shield is worth $84M/year — pure value creation from choosing debt over equity for the same investment.

Contractual vs Residual Claims +

Debt holders price for downside protection

Debt holder’s max upside is the interest rate (8%). They price for downside protection via covenants, collateral, and priority. Equity investors face uncapped upside and uncapped downside — they need much higher expected returns to justify the risk.

Information asymmetry compounds this: debt holders protect themselves with covenants, while equity holders have board seats but can’t contractually force profitability. More trust required = riskier = more expensive.

Why Not All Debt? +

Financial distress, capacity limits, rising marginal cost

If debt is cheaper, why not fund everything with debt? Three limits:

1. Financial distress risk: Too much leverage makes a temporary downturn existential. Missing one interest payment can trigger default cascades.

2. Debt capacity limits: Lenders won’t fund beyond cash flow coverage. The DSCR sets a hard ceiling.

3. Rising marginal cost: First billion at 8%, fifth billion at 12%. At some point, additional debt costs more than equity.

F2 Contracted Revenue & Lenders Economics +

Not collateral — cash flow underwriting. Contracted revenue answers the question lenders care about most: “How likely is it that you default?” via DSCR mechanics.

Key Takeaway

A 40% commitment discount looks expensive in isolation, but the contracted revenue it creates can unlock debt capacity whose NPV far exceeds the discount given. Pricing decisions directly affect capital structure.

Think of it like...

Getting a mortgage. The bank doesn’t just look at the house (collateral) — they look at your salary and employment contract (contracted revenue). A tenured professor with $100K salary gets a bigger mortgage at a lower rate than a freelancer earning $150K with no contract.

The Four Mechanisms

3-4x

Higher Debt Capacity

150bps

Lower Interest Rate

2-3x

Longer Tenor

$75M

Annual Interest Savings ($5B)

Higher Debt Capacity +

DSCR math: same facility, dramatically different borrowing

Lenders size loans using DSCR — want cash flow at 1.3-2.0x annual debt service.

Without contracted revenue: Lender underwrites to conservative $300M → $150M FCF → supports ~$100M/year debt service → ~$1-1.5B total debt capacity.

With 15-year Oracle contract at $600M/year: Lender underwrites to $600M → $400M FCF → supports ~$267M/year debt service → ~$3-4B total debt capacity. Same facility, dramatically different borrowing.

Lower Interest & Looser Covenants +

SOFR + 400bps → SOFR + 250bps

Contracted revenue with investment-grade counterparty compresses spread: SOFR + 400bps → SOFR + 250bps. On $5B = $75M/year in interest savings.

Looser covenants: fewer restrictions on additional borrowing, less stringent maintenance ratios, more operational flexibility. Longer tenor: 15-year contracted revenue supports 10-12 year debt vs 3-5 years for uncontracted.

Assignment of Contracts +

Quasi-collateral: lenders step into payment streams

Lenders structure assignment of contracts as security interest. If Crusoe defaults, JPMorgan could step into Crusoe’s position and receive Oracle’s lease payments directly.

Like mortgage-backed securities — the payment streams themselves are the security. Not the physical assets, but the right to receive contracted cash flows.

The Virtuous Cycle +

Contracts → cheaper debt → lower WACC → more contracts

Every 1-year or 3-year GPU reservation contract signed makes the debt package more attractive to lenders → cheaper debt → lower WACC → more competitive pricing → more customers → more contracts → even more debt capacity.

This is why commitment discounts aren’t just about revenue: they’re a capital structure optimization. The PM must model the full-cycle NPV, not just the direct pricing impact.

F3 Debt vs Equity Across Stages Economics +

From Series A (95-100% equity) to public company (optimized mix). As cash flows become more predictable, the optimal debt/equity ratio shifts dramatically.

Key Takeaway

Debt requires predictability. Equity tolerates uncertainty. Crusoe runs two capital structures in one company: project-finance-funded infrastructure + venture-funded software.

Think of it like...

A person’s financial life. College student = 100% “equity” (parental support, scholarships). First job = some debt (car loan). Established career = mortgage, credit lines. Retired = optimized portfolio. More predictable income unlocks more leverage.

Interactive: Capital Structure by Stage

Series A → Series B +

95-100% equity → 80-90% equity

Series A: No revenue, no assets, no track record. Might not exist in 18 months. Cost of equity: 50-100%+ implied (VCs need 100x potential). Debt is unavailable. Mix: 95-100% equity. Exception: venture debt (20-30% of last equity round) with warrants.

Series B: $5-20M ARR, real product, paying customers. Cost of equity: 30-50%. Cost of debt: 12-15% (venture debt with warrants). Mix: 80-90% equity. Equipment financing becomes possible (80% LTV on GPU purchases).

Series C/D → Pre-IPO +

60-80% equity → 40-60% equity

Series C/D: $50-200M+ ARR, proven unit economics, enterprise contracts. Cost of equity: 15-25%. Cost of debt: 8-12% (term loans, asset-backed). Mix: 60-80% equity. CoreWeave pioneered billions in GPU-backed debt at this stage.

Pre-IPO: $200M-1B+ revenue, clear profitability path. Cost of equity: 12-20%. Cost of debt: 6-9%. Mix: 40-60% equity. Convertible notes popular: lower interest (2-5%) with embedded call option. Crusoe is here now (Series E, $10B+).

Public Company +

Full optimization — cost of equity drops dramatically

Cost of equity drops dramatically: 10-15% (liquidity, transparency, diversifiability). Cost of debt: 4-7% (investment-grade bonds, commercial paper). Gap narrows but never closes — debt holders are always paid first.

Active capital management: share buybacks, debt-funded buybacks, dividend policy, credit rating management. The CFO becomes a portfolio manager optimizing the capital structure continuously.

Crusoe’s Dual Structure +

Two capital structures in one company

Crusoe runs two capital structures simultaneously:

Project-finance infrastructure: Abilene’s $9.6B debt + $5B equity. Low cost of capital, asset-heavy, contracted cash flows. Like a utility or pipeline company.

Venture-funded software: Cloud platform, managed inference. High cost of capital, asset-light, uncertain returns. Like a typical tech startup.

The PM must understand which investments belong to which bucket. Managed inference feature ($20M) → equity-funded, needs venture-scale returns. Data center with signed contract → leverage cheap debt, lower hurdle rate.

→ Foundation model funding & mega-rounds

From Token to Balance Sheet

Follow the Dollar