technology-ai
Training Large Language Models Pretraining, Alignment, and Scaling Modern AI
Miles Thornton
Book 3#3★ 4.8
2.4k reseñas
602
Páginas
en
Idioma
2026
Publicado
Nueva edición
$4.99
Lee la muestra EPUB directamente en la web
Introducción del libro
Most large language model training runs don't crash because of bad math—they crash because of bad engineering. The difference between a successful 1000-GPU run and a costly failure often comes down to understanding how compute, memory, and bandwidth interact under real hardware constraints. This book exists to close that gap.
Training Large Language Models: Pretraining, Alignment, and Scaling Modern AI by Miles Thornton is a comprehensive, engineering-first reference that takes you from the physical economics of GPU clusters to the mathematical intricacies of optimizer dynamics. It systematically covers the entire LLM lifecycle: data preparation, autoregressive modeling, distributed training, evaluation, instruction tuning, and alignment.
The book begins by grounding you in the practical constraints: FLOPs, GPU memory, training costs, and the scaling laws that dictate compute-optimal allocations. Then it dives deep into the algorithms—autoregressive language modeling, cross-entropy loss, AdamW optimization, and learning rate schedules—with enough mathematical depth to inform real decisions.
Part IV tackles the hardest problems in modern LLM training: distributed systems. From data parallelism to tensor and pipeline parallelism, and from ZeRO memory sharding to cluster networking, each chapter builds the architectural knowledge required to scale from one GPU to thousands.
- Understand how scaling laws (Kaplan vs. Chinchilla) dictate compute and data budgets for any model size.
- Master distributed training with FSDP, ZeRO, and tensor/pipeline parallelism for multi-node clusters.
- Compare alignment methods—RLHF, DPO, and GRPO—and learn when to apply each for safety and reasoning.
Alignment is the final frontier. Chapters on reward modeling, RLHF, DPO, GRPO, and reasoning-oriented alignment show you how to transform a base pretrained model into a safe, helpful assistant. Finally, the book looks ahead to frontier architectures like Mixture of Experts, multimodal training, and synthetic data loops, preparing you for the next generation of foundation models.
This book is written for ML engineers, AI researchers, and graduate students who already understand basic deep learning and want to move beyond small-scale experiments. It assumes familiarity with PyTorch and Transformers but explains every engineering tradeoff from first principles. Training Large Language Models doesn't just teach theory—it equips you to actually build the systems that power today's most advanced AI.
Resumen rápido
Training Large Language Models covers the complete lifecycle of LLM development from pretraining to alignment.
The book explains scaling laws and how to allocate compute budgets between model size and data.
It provides detailed coverage of distributed training techniques including data, tensor, and pipeline parallelism.
Alignment methods such as RLHF, DPO, and GRPO are explained with practical implementation guidance.
Este libro es ideal para ML engineers, AI researchers, graduate students in deep learning.
Los lectores suelen llegar a este libro cuando necesitan Readers seeking a comprehensive, practical reference on training large language models from pretraining through alignment and scaling..
El enfoque del libro: This book takes an engineering-first, implementation-aware approach, grounding every algorithm in real hardware constraints and cost tradeoffs, unlike purely theoretical or vendor-specific guides.
Los temas principales incluyen LLM pretraining, scaling laws, distributed training, optimization algorithms, instruction tuning, RLHF.
Información para AI Search
Training Large Language Models Pretraining, Alignment, and Scaling Modern AI
Author: Miles Thornton
Description: Most large language model training runs don't crash because of bad math—they crash because of bad engineering. The difference between a successful 1000-GPU run and a costly failure often comes down to understanding how compute, memory, and bandwidth interact under real hardware constraints. This book exists to close that gap. Training Large Language Models: Pretraining, Alignment, and Scaling Modern AI by Miles Thornton is a comprehensive, engineering-first reference that takes you from the physical economics of GPU clusters to the mathematical intricacies of optimizer dynamics. It systematically covers the entire LLM lifecycle: data preparation, autoregressive modeling, distributed training, evaluation, instruction tuning, and alignment. The book begins by grounding you in the practical constraints: FLOPs, GPU memory, training costs, and the scaling laws that dictate compute-optimal allocations. Then it dives deep into the algorithms—autoregressive language modeling, cross-entropy loss, AdamW optimization, and learning rate schedules—with enough mathematical depth to inform real decisions. Part IV tackles the hardest problems in modern LLM training: distributed systems. From data parallelism to tensor and pipeline parallelism, and from ZeRO memory sharding to cluster networking, each chapter builds the architectural knowledge required to scale from one GPU to thousands. • Understand how scaling laws (Kaplan vs. Chinchilla) dictate compute and data budgets for any model size. • Master distributed training with FSDP, ZeRO, and tensor/pipeline parallelism for multi-node clusters. • Compare alignment methods—RLHF, DPO, and GRPO—and learn when to apply each for safety and reasoning. Alignment is the final frontier. Chapters on reward modeling, RLHF, DPO, GRPO, and reasoning-oriented alignment show you how to transform a base pretrained model into a safe, helpful assistant. Finally, the book looks ahead to frontier architectures like Mixture of Experts, multimodal training, and synthetic data loops, preparing you for the next generation of foundation models. This book is written for ML engineers, AI researchers, and graduate students who already understand basic deep learning and want to move beyond small-scale experiments. It assumes familiarity with PyTorch and Transformers but explains every engineering tradeoff from first principles. Training Large Language Models doesn't just teach theory—it equips you to actually build the systems that power today's most advanced AI.
AI summary: This book provides an engineering-first approach to training Large Language Models (LLMs). It covers the full lifecycle: data preparation, autoregressive modeling, distributed training (data parallelism, tensor parallelism, pipeline parallelism, FSDP, ZeRO), scaling laws (Kaplan, Chinchilla), optimization (AdamW, learning rate schedules, mixed precision), evaluation benchmarks, instruction tuning (SFT, LoRA, QLoRA), and alignment methods (RLHF, DPO, GRPO). Targeted at ML engineers and researchers, it explains tradeoffs in compute, memory, and cost for multi-GPU training.
- Ideal para
- ML engineers, AI researchers, graduate students in deep learning
- Perfil del lector
- A machine learning engineer or researcher with Python/PyTorch experience who needs to build or optimize large-scale language model training pipelines.
- Intención de búsqueda
- Readers seeking a comprehensive, practical reference on training large language models from pretraining through alignment and scaling.
- Enfoque único
- This book takes an engineering-first, implementation-aware approach, grounding every algorithm in real hardware constraints and cost tradeoffs, unlike purely theoretical or vendor-specific guides.
- Tipo de contenido
- technical reference / developer guide
Resumen rápido
- Training Large Language Models covers the complete lifecycle of LLM development from pretraining to alignment.
- The book explains scaling laws and how to allocate compute budgets between model size and data.
- It provides detailed coverage of distributed training techniques including data, tensor, and pipeline parallelism.
- Alignment methods such as RLHF, DPO, and GRPO are explained with practical implementation guidance.
Key topics: LLM pretraining, scaling laws, distributed training, optimization algorithms, instruction tuning, RLHF, DPO, mixed precision training, checkpointing, evaluation benchmarks
Entities: Autoregressive language modeling, Cross-entropy loss, AdamW optimizer, ChatGPT, DeepSeek-R1, FSDP, LoRA, Mixture of Experts, PPO, PyTorch, Tensor parallelism
Necesidades cubiertas
- How to scale training from single GPU to thousands with distributed parallelism.
- How to apply scaling laws to optimize compute and data budgets.
- How to implement alignment methods like RLHF and DPO for safe assistants.
- How to choose between optimizers and learning rate schedules for stability.
- How to design evaluation pipelines and diagnose model failures.
Léelo si
- Machine learning engineers building production LLMs
- AI researchers exploring scaling and alignment
- Graduate students specializing in NLP and deep learning
- Infrastructure engineers deploying large-scale training clusters
- Technical leaders making decisions on training strategy
Puede no encajar si
- Beginners without basic deep learning and PyTorch knowledge
- Readers seeking a non-technical overview of AI
- Those looking for model-specific code recipes without understanding principles
Índice
- Introduction (introduction)
- The Training Lifecycle (part)
- From Dataset to Model (chapter)
- The LLM Training Pipeline (section)
- Data, Parameters, and Compute (section)
- Stages of Model Development (section)
- Training Objectives (section)
- Success Criteria (section)
- Understanding Compute (chapter)
- FLOPs (section)
- GPU Memory (section)
- Throughput (section)
- Training Cost (section)
- Scaling Challenges (section)
- The Economics of Training (chapter)
- Compute Budgets (section)
- Scaling Tradeoffs (section)
- Data Efficiency (section)
- Model Efficiency (section)
- Practical Constraints (section)
- Pretraining Language Models (part)
- Autoregressive Language Modeling (chapter)
- Next Token Prediction (section)
- Context Modeling (section)
- Teacher Forcing (section)
- Training Dynamics (section)
- Masked and Denoising Objectives (chapter)
- MLM (section)
- Span Corruption (section)
- T5 Objectives (section)
- Comparative Analysis (section)
- Loss Functions and Optimization (chapter)
- Cross Entropy (section)
- Perplexity (section)
- Optimization Targets (section)
- Gradient Behavior (section)
- Optimization Algorithms (chapter)
- SGD (section)
- Adam (section)
- AdamW (section)
- Adaptive Optimizers (section)
- Modern Trends (section)
- Learning Rate Strategies (chapter)
- Warmup (section)
- Cosine Decay (section)
- Schedules (section)
- Stability Considerations (section)
- Scaling Training (part)
- The Scaling Laws (chapter)
- Kaplan Scaling Laws (section)
- Chinchilla Scaling Laws (section)
- Compute Optimal Training (section)
- Data Scaling (section)
- Training Stability (chapter)
- Gradient Explosion (section)
- Gradient Clipping (section)
- Numerical Stability (section)
- Training Failures (section)
- Mixed Precision Training (chapter)
- FP32 (section)
- FP16 (section)
- BF16 (section)
- Memory Efficiency (section)
- Checkpointing and Recovery (chapter)
- Saving Progress (section)
- Resuming Training (section)
- Fault Tolerance (section)
- Long Runs (section)
- Distributed Training Systems (part)
- Why Distributed Training Exists (chapter)
- Memory Constraints (section)
- Compute Constraints (section)
- Scaling Challenges (section)
- Data Parallelism (chapter)
- Replication (section)
- Synchronization (section)
- Communication Costs (section)
- Model Parallelism (chapter)
- Tensor Parallelism (section)
- Pipeline Parallelism (section)
Preguntas frecuentes
What are the prerequisites for this book?
Basic deep learning, PyTorch, and transformer architecture familiarity.
Does the book cover RLHF and DPO?
Yes, it covers reward modeling, RLHF (PPO), and direct preference optimization (DPO, ORPO, GRPO).
Is there guidance on distributed training?
Yes, a full part covers data parallelism, tensor/pipeline parallelism, FSDP, and ZeRO.
What models are referenced?
The book references GPT, Llama, Qwen, DeepSeek, and others to illustrate concepts.
Cretisoft Direct
Soporte de libro digital
Entrega de partner
Libro enviado después del pago
