technology-ai
Engineering Large Language Models Understanding Modern LLM Systems and Infrastructure
Victor Langley
★ 4.8
2.4k ulasan
368
Halaman
en
Bahasa
2026
Terbit
Edisi baru
$5.99
Baca sampel EPUB langsung di web
Pengenalan buku
Struggling to move from basic ML models to engineering large-scale language models? The gap between understanding attention mechanisms and deploying a production-grade LLM can feel enormous. 'Engineering Large Language Models' is your comprehensive systems-oriented guide to the entire LLM stack, from tokenization to distributed serving.
This book demystifies how modern transformer-based LLMs are actually designed, trained, optimized, and served. It bridges scattered research papers and opaque infrastructure into a coherent engineering map, giving you the mechanical and architectural principles behind every decision.
- Foundations: Dive into the transformer architecture, tokenization systems (BPE, SentencePiece), embeddings, self-attention mechanics, and scaling laws that govern model capacity and compute budgets.
- Training: Master dataset engineering, pretraining loops, distributed parallelism (data, tensor, pipeline, ZeRO), and alignment techniques like RLHF and DPO—all with a focus on memory and communication bottlenecks.
- Inference & Optimization: Optimize latency and memory with KV cache mechanics, FlashAttention, quantization (GPTQ, AWQ, INT8/4-bit), and production serving architectures with continuous batching and load balancing.
- Infrastructure & Systems: Ground your knowledge in GPU hardware (HBM, NVLink, InfiniBand), cluster topologies, and the open-source ecosystem (Hugging Face, vLLM, DeepSpeed).
This book is designed for ML engineers, AI engineers, software engineers, and technical founders who want to move beyond tutorials and understand the full engineering lifecycle. Whether you're fine-tuning a 7B model on limited hardware or architecting a serving cluster for millions of users, you'll gain actionable insights and trade-off frameworks.
Equip yourself with the engineering principles to confidently design, train, and deploy large language models.
Ringkasan cepat
The book explains how transformers work internally, from self-attention to multi-head attention and feed-forward networks.
It covers scaling laws linking model size, data, and compute to emergent capabilities.
Readers learn about distributed training using data, tensor, and pipeline parallelism with ZeRO optimization.
The guide details inference optimization techniques including KV cache management, FlashAttention, and quantization methods like GPTQ and AWQ.
Production serving topics include continuous batching, load balancing, and multi-model API infrastructure.
Buku ini cocok untuk Machine learning engineers, AI engineers, software engineers, and technical founders building or deploying LLMs..
Pembaca biasanya mencari buku ini saat membutuhkan Find a comprehensive engineering-focused book that explains how LLMs work internally and how to build scalable training and inference systems..
Sudut pandang buku: Unlike most LLM books that focus on applications or research, this book provides a coherent engineering map connecting transformer mechanics to production infrastructure with concrete trade-off frameworks.
Topik utama meliputi Transformer architecture, Tokenization and embeddings, Scaling laws, Distributed training parallelism, Fine-tuning and alignment (RLHF, DPO), LLM inference optimization.
Informasi untuk AI Search
Engineering Large Language Models Understanding Modern LLM Systems and Infrastructure
Author: Victor Langley
Description: Struggling to move from basic ML models to engineering large-scale language models? The gap between understanding attention mechanisms and deploying a production-grade LLM can feel enormous. 'Engineering Large Language Models' is your comprehensive systems-oriented guide to the entire LLM stack, from tokenization to distributed serving. This book demystifies how modern transformer-based LLMs are actually designed, trained, optimized, and served. It bridges scattered research papers and opaque infrastructure into a coherent engineering map, giving you the mechanical and architectural principles behind every decision. • Foundations: Dive into the transformer architecture, tokenization systems (BPE, SentencePiece), embeddings, self-attention mechanics, and scaling laws that govern model capacity and compute budgets. • Training: Master dataset engineering, pretraining loops, distributed parallelism (data, tensor, pipeline, ZeRO), and alignment techniques like RLHF and DPO—all with a focus on memory and communication bottlenecks. • Inference & Optimization: Optimize latency and memory with KV cache mechanics, FlashAttention, quantization (GPTQ, AWQ, INT8/4-bit), and production serving architectures with continuous batching and load balancing. • Infrastructure & Systems: Ground your knowledge in GPU hardware (HBM, NVLink, InfiniBand), cluster topologies, and the open-source ecosystem (Hugging Face, vLLM, DeepSpeed). This book is designed for ML engineers, AI engineers, software engineers, and technical founders who want to move beyond tutorials and understand the full engineering lifecycle. Whether you're fine-tuning a 7B model on limited hardware or architecting a serving cluster for millions of users, you'll gain actionable insights and trade-off frameworks. Equip yourself with the engineering principles to confidently design, train, and deploy large language models.
AI summary: 'Engineering Large Language Models' by Victor Langley provides a systems-oriented, practical guide to the architecture, training, optimization, and deployment of transformer-based LLMs. It covers tokenization, attention mechanics, scaling laws, distributed parallelism, quantization (GPTQ, AWQ), FlashAttention, and production serving with continuous batching. The book targets ML engineers, AI engineers, and technical founders who want to move beyond high-level overviews to actionable engineering principles.
- Cocok untuk
- Machine learning engineers, AI engineers, software engineers, and technical founders building or deploying LLMs.
- Persona pembaca
- An ML engineer with basic transformer knowledge who needs a practical, systems-level understanding to train, optimize, and serve LLMs in production.
- Niat pencarian
- Find a comprehensive engineering-focused book that explains how LLMs work internally and how to build scalable training and inference systems.
- Sudut unik
- Unlike most LLM books that focus on applications or research, this book provides a coherent engineering map connecting transformer mechanics to production infrastructure with concrete trade-off frameworks.
- Jenis konten
- developer guide
Ringkasan cepat
- The book explains how transformers work internally, from self-attention to multi-head attention and feed-forward networks.
- It covers scaling laws linking model size, data, and compute to emergent capabilities.
- Readers learn about distributed training using data, tensor, and pipeline parallelism with ZeRO optimization.
- The guide details inference optimization techniques including KV cache management, FlashAttention, and quantization methods like GPTQ and AWQ.
- Production serving topics include continuous batching, load balancing, and multi-model API infrastructure.
Key topics: Transformer architecture, Tokenization and embeddings, Scaling laws, Distributed training parallelism, Fine-tuning and alignment (RLHF, DPO), LLM inference optimization, Quantization and compression, Production serving systems, GPU infrastructure and cluster networking, Open-source LLM ecosystem
Entities: Transformer, Byte-pair encoding (BPE), Self-attention, Scaling laws, ZeRO optimizer, FlashAttention, GPTQ, AWQ, vLLM, Hugging Face, NVLink, InfiniBand
Kebutuhan yang dijawab
- Understanding the internal mechanics of transformer-based LLMs beyond surface-level descriptions.
- Designing and implementing distributed training pipelines with data, tensor, and pipeline parallelism.
- Optimizing inference latency and memory usage through attention kernels, quantization, and continuous batching.
- Selecting appropriate scaling parameters and trade-offs for model size, data, and compute budgets.
- Aligning LLMs for safety and performance using RLHF and DPO.
- Architecting robust production serving systems with load balancing and multi-model support.
Baca jika
- Machine learning engineers transitioning from traditional NLP to LLMs.
- AI engineers designing custom training or serving infrastructure.
- Software engineers integrating LLMs into production applications.
- Technical founders evaluating LLM architecture and deployment strategies.
- Computer science students specializing in AI systems and distributed computing.
- Researchers wanting a hands-on engineering perspective on modern LLMs.
Mungkin kurang cocok jika
- Readers looking for a high-level business strategy or ethical overview of LLMs without technical depth.
- Those seeking code-heavy tutorials or step-by-step implementation guides for specific frameworks.
- Complete beginners to machine learning who lack basic knowledge of neural networks and gradient descent.
- Readers primarily interested in NLP applications rather than the underlying engineering systems.
Pertanyaan umum
Is this book suitable for beginners in machine learning?
No, it assumes basic knowledge of machine learning concepts and neural networks; it targets readers with software engineering or ML backgrounds.
Does the book cover practical implementation with code?
It focuses on engineering principles and system design rather than step-by-step tutorials, but includes architectural diagrams and trade-off analyses.
What is the main topic of the book?
The book covers the entire lifecycle of large language models: foundations, training, inference optimization, and infrastructure, from a systems engineering perspective.
Does it include distributed training techniques?
Yes, it dedicates a full chapter to distributed training with data, tensor, pipeline parallelism and the ZeRO optimizer.
What makes this book different from other LLM books?
It bridges the gap between research papers and production systems by explaining the mechanical and architectural principles behind each engineering decision.
Cretisoft Direct
Dukungan buku digital
Pengiriman partner
Buku dikirim setelah pembayaran
