What it costs to train frontier AI models — and how the industry finances it.
Training costs are doubling every ~8 months. GPT-4 cost ~$100M; frontier models by 2027 will exceed $1B. Meanwhile, cloud GPU rental has fallen 60-70%, creating a growing gap between buying and renting.
| GPU | Price | Notes |
|---|---|---|
| H100 SXM | $25-31K | The workhorse of 2024 training |
| B200 | $30-35K | 2x H100 performance |
| GB200 NVL72 | ~$3M | 72-GPU rack system |
H100 cloud pricing fell from $8/hr (early 2024) to ~$1.50/hr (late 2025) — a 60-70% drop driven by massive supply buildout and competition from CoreWeave, Lambda, and hyperscalers.
| Model | Training Cost | GPUs |
|---|---|---|
| GPT-4 | $78-100M | ~25,000 A100s |
| Llama 3 405B | ~$60M | 16,384 H100s |
| DeepSeek V3 | $5.5M (GPU only) | 2,048 H800s |
| Gemini Ultra | ~$191M | TPU v4 pods |
Chips: 21-30% · Server/networking: 22-30% · Staff: 29-49% · Energy: 2-6%. Staff costs dominate at smaller scales; hardware dominates at frontier scale.
Chinchilla-optimal training (20 tokens/parameter) minimizes training cost but ignores inference cost. The industry now over-trains smaller models — Llama 3 at 200 tokens/param — to shift cost from inference to training.
Building a factory vs. running it. Chinchilla minimizes factory construction cost but doesn’t consider that a smaller, better-trained factory may produce cheaper widgets for years to come.
Chinchilla scaling laws (2022): for a fixed compute budget, the optimal allocation is 20 tokens per parameter. A 70B model should train on 1.4T tokens. This minimizes training loss per FLOP.
Chinchilla-optimal training minimizes training cost but ignores inference cost. A model’s lifetime inference cost can be 10-15x its training cost (GPT-4: ~$100M training, ~$2.3B inference over 18 months). Optimizing for training alone is locally optimal but globally wasteful.
| Model | Params | Tokens | Tokens/Param |
|---|---|---|---|
| Chinchilla | 70B | 1.4T | 20x (optimal) |
| Llama 3 405B | 405B | 15T | 37x |
| Llama 3 8B | 8B | 15T | 1,875x |
| Qwen3-0.6B | 0.6B | 36T | 60,000x |
The industry shift: spend more on training (one-time) to make every inference call cheaper (ongoing). Over-training smaller models by 10-1,000x Chinchilla-optimal is now standard practice.
OpenAI spent $7B on R&D in one year, with over 70% going to failed experiments. Less than $1B was for the final successful training run. Frontier AI research is dominated by the cost of exploration, not exploitation.
OpenAI 2024 R&D spend: ~$7B total. The vast majority (>70%) went to experimentation — architecture search, hyperparameter tuning, failed training runs, and data mixture experiments. The final production training run for a model like GPT-4 costs <$1B of that total.
This mirrors pharmaceutical R&D: the drug that ships costs a fraction of the total research budget that produced hundreds of failed candidates.
For 99% of organizations, fine-tuning or API access is the right choice. Foundation model training is only viable for the largest labs with differentiated data or architecture advantages.
| Approach | Cost Range | When It Makes Sense |
|---|---|---|
| Foundation training | $50M-$500M+ | Differentiated architecture, massive data moat |
| Full fine-tuning | ~$50K/run (7B) | Need deep domain specialization |
| LoRA/QLoRA | $300-$3,000/run | 90-95% of fine-tuning quality at 1% cost |
| API inference | Pay per token | No training cost, fastest time to value |
Build from scratch if: you have unique data at scale, need architecture control, and can sustain $100M+/year in compute spend (OpenAI, Anthropic, Google, Meta).
Fine-tune if: you need domain-specific behavior, proprietary knowledge injection, or format/style control that prompting can’t achieve.
Use APIs if: your use case can be solved with prompting + RAG, you need rapid iteration, or you can’t justify the fixed cost of training infrastructure.
Modern managed stacks now cover the full post-training ladder: supervised text/chat/vision fine-tuning, DPO, and RFT. This collapses integration overhead compared with stitching multiple vendors.
Fireworks-specific levers that change unit economics: one-click LoRA deployment, support for up to 100 LoRA adapters per base model on a single deployment, and secure enterprise modes for customer-controlled data boundaries.
Operationally, synthetic data generation plus built-in evaluation loops reduce labeling and experimentation costs, shifting budgets from infrastructure plumbing to model quality iteration.
Llama 3 would have cost $483M on AWS vs. ~$60M on Meta’s own cluster. The 8x cost advantage of on-prem at scale explains why every major lab is building its own data centers.
| Lab | Compute Spend | CapEx (2025) |
|---|---|---|
| OpenAI | ~$7B/yr R&D | Partnership with Microsoft |
| Anthropic | $33.7B raised | AWS + GCP partnerships |
| Google DeepMind | Internal TPU | $85B total AI CapEx |
| Meta AI | Own clusters | $68B total AI CapEx |
Together AI ($3.3B valuation): managed training and inference platform. Anyscale ($1B+): Ray-based distributed training. Modal ($1.1B): serverless GPU compute with training focus. These companies provide the infrastructure layer for organizations that want to train without building their own clusters.
Cloud = buying a call option on compute (pay premium for flexibility). On-prem = owning the asset (3-8x cheaper if utilization stays above 60-70%). The right answer depends on demand predictability and time horizon.
No upfront CapEx. Access to latest hardware without procurement delays. Burst capacity for experiments. Scale down when not training. Option value: switch hardware as next-gen GPUs ship every 12-18 months.
3-8x cheaper for sustained workloads at high utilization. Full control over hardware configuration, networking topology, and security. No egress charges for massive data movement. Predictable cost structure for financial planning.
Data center space and construction. Power infrastructure and cooling systems. Network engineering staff. 5-6 month GPU procurement lead times. Hardware maintenance and spare inventory. Opportunity cost of capital locked in depreciating assets.
GPU-backed financing enabled the AI infrastructure buildout, but the underlying collateral is depreciating rapidly: H100 rentals fell 60-70%, and hardware faces 30-40% year-one depreciation. Lenders are taking concentrated technology risk.
CoreWeave: $7.6B in GPU-backed debt, $14.6B in equipment assets. IPO’d at $40, reached $183. The poster child for GPU-as-collateral financing.
Lambda: $1.5B sale-leaseback — sell GPUs to a lessor, lease them back. Frees capital while retaining operational control.
Creative structures: 5-year GPU leases, synthetic GPU derivatives, sale-leasebacks backed by contracted cloud revenue.
H100 cloud rental fell 60-70% in 18 months. Hardware faces 30-40% year-one depreciation. Each new generation (B200, GB200) makes the previous one less economically viable. Lenders underwriting 5-year GPU loans face significant residual value risk.
AI funding has reached unprecedented scale: Anthropic raised $30B at $380B valuation, OpenAI planned $40B+ rounds, and Big Tech collectively committed $405B in 2025 AI CapEx. The GPU-rich vs GPU-poor dynamic is the defining market structure.
| Company | Amount | Valuation |
|---|---|---|
| OpenAI | $40B+ | ~$300B |
| Anthropic | $30B | $380B (Feb 2026) |
| xAI | $6B | $50B |
| CoreWeave | $7.6B debt + IPO | $35B+ at peak |
2025 CapEx commitments: Google $85B, Meta $68B, Microsoft $80B, Amazon $100B+, Apple $500M (data centers). Morgan Stanley projects $3T cumulative AI infrastructure spending through 2029.
OpenAI’s $1T infrastructure plan through 2035 includes custom data centers, chip development (with Samsung/TSMC), and a global network of training clusters.
GPT-4’s lifetime inference cost (~$2.3B) is 15x its training cost (~$100M). As models are deployed at scale, inference becomes the dominant cost driver. The AI inference market is projected to grow from $106B (2025) to $255B by 2030.
| Year | Training | Inference |
|---|---|---|
| 2023 | ~67% | ~33% |
| 2025 | ~50% | ~50% |
| 2026 (proj.) | ~33% | ~67% |
Reasoning-capable deployment modes (interleaved thinking, preserved reasoning context, tool-augmented loops) increase runtime decode work per request. Even when training cost is unchanged, inference spend grows faster because each query executes more steps.
Cost controls increasingly move to runtime policy: reasoning effort levels, thinking-token budgets, and history retention policies. In practice, product pricing and scheduler policy now matter as much as model architecture for total lifecycle margin.
Total AI industry spend in 2025: ~$527B. Total AI revenue: ~$51B. That’s a 10:1 spend-to-revenue ratio, the largest infrastructure-to-revenue gap since the early days of cloud computing. The bet: inference revenue at scale will eventually justify the capital deployed.