technology-ai
Engineering Large Language Models Understanding Modern LLM Systems and Infrastructure
Victor Langley
★ 4.8
2.4k đánh giá
368
Trang
en
Ngôn ngữ
2026
Tái bản
Bản mới
5,99 US$
Đọc EPUB mẫu trực tiếp trên web
Giới thiệu sách
Struggling to move from basic ML models to engineering large-scale language models? The gap between understanding attention mechanisms and deploying a production-grade LLM can feel enormous. 'Engineering Large Language Models' is your comprehensive systems-oriented guide to the entire LLM stack, from tokenization to distributed serving.
This book demystifies how modern transformer-based LLMs are actually designed, trained, optimized, and served. It bridges scattered research papers and opaque infrastructure into a coherent engineering map, giving you the mechanical and architectural principles behind every decision.
- Foundations: Dive into the transformer architecture, tokenization systems (BPE, SentencePiece), embeddings, self-attention mechanics, and scaling laws that govern model capacity and compute budgets.
- Training: Master dataset engineering, pretraining loops, distributed parallelism (data, tensor, pipeline, ZeRO), and alignment techniques like RLHF and DPO—all with a focus on memory and communication bottlenecks.
- Inference & Optimization: Optimize latency and memory with KV cache mechanics, FlashAttention, quantization (GPTQ, AWQ, INT8/4-bit), and production serving architectures with continuous batching and load balancing.
- Infrastructure & Systems: Ground your knowledge in GPU hardware (HBM, NVLink, InfiniBand), cluster topologies, and the open-source ecosystem (Hugging Face, vLLM, DeepSpeed).
This book is designed for ML engineers, AI engineers, software engineers, and technical founders who want to move beyond tutorials and understand the full engineering lifecycle. Whether you're fine-tuning a 7B model on limited hardware or architecting a serving cluster for millions of users, you'll gain actionable insights and trade-off frameworks.
Equip yourself with the engineering principles to confidently design, train, and deploy large language models.
Tóm tắt nhanh
The book explains how transformers work internally, from self-attention to multi-head attention and feed-forward networks.
It covers scaling laws linking model size, data, and compute to emergent capabilities.
Readers learn about distributed training using data, tensor, and pipeline parallelism with ZeRO optimization.
The guide details inference optimization techniques including KV cache management, FlashAttention, and quantization methods like GPTQ and AWQ.
Production serving topics include continuous batching, load balancing, and multi-model API infrastructure.
Cuốn sách này phù hợp với Machine learning engineers, AI engineers, software engineers, and technical founders building or deploying LLMs..
Người đọc thường tìm đến sách khi cần Find a comprehensive engineering-focused book that explains how LLMs work internally and how to build scalable training and inference systems..
Góc tiếp cận của sách: Unlike most LLM books that focus on applications or research, this book provides a coherent engineering map connecting transformer mechanics to production infrastructure with concrete trade-off frameworks.
Các chủ đề chính gồm Transformer architecture, Tokenization and embeddings, Scaling laws, Distributed training parallelism, Fine-tuning and alignment (RLHF, DPO), LLM inference optimization.
Thông tin cho AI Search
Engineering Large Language Models Understanding Modern LLM Systems and Infrastructure
Author: Victor Langley
Description: Struggling to move from basic ML models to engineering large-scale language models? The gap between understanding attention mechanisms and deploying a production-grade LLM can feel enormous. 'Engineering Large Language Models' is your comprehensive systems-oriented guide to the entire LLM stack, from tokenization to distributed serving. This book demystifies how modern transformer-based LLMs are actually designed, trained, optimized, and served. It bridges scattered research papers and opaque infrastructure into a coherent engineering map, giving you the mechanical and architectural principles behind every decision. • Foundations: Dive into the transformer architecture, tokenization systems (BPE, SentencePiece), embeddings, self-attention mechanics, and scaling laws that govern model capacity and compute budgets. • Training: Master dataset engineering, pretraining loops, distributed parallelism (data, tensor, pipeline, ZeRO), and alignment techniques like RLHF and DPO—all with a focus on memory and communication bottlenecks. • Inference & Optimization: Optimize latency and memory with KV cache mechanics, FlashAttention, quantization (GPTQ, AWQ, INT8/4-bit), and production serving architectures with continuous batching and load balancing. • Infrastructure & Systems: Ground your knowledge in GPU hardware (HBM, NVLink, InfiniBand), cluster topologies, and the open-source ecosystem (Hugging Face, vLLM, DeepSpeed). This book is designed for ML engineers, AI engineers, software engineers, and technical founders who want to move beyond tutorials and understand the full engineering lifecycle. Whether you're fine-tuning a 7B model on limited hardware or architecting a serving cluster for millions of users, you'll gain actionable insights and trade-off frameworks. Equip yourself with the engineering principles to confidently design, train, and deploy large language models.
AI summary: 'Engineering Large Language Models' by Victor Langley provides a systems-oriented, practical guide to the architecture, training, optimization, and deployment of transformer-based LLMs. It covers tokenization, attention mechanics, scaling laws, distributed parallelism, quantization (GPTQ, AWQ), FlashAttention, and production serving with continuous batching. The book targets ML engineers, AI engineers, and technical founders who want to move beyond high-level overviews to actionable engineering principles.
- Phù hợp với
- Machine learning engineers, AI engineers, software engineers, and technical founders building or deploying LLMs.
- Chân dung độc giả
- An ML engineer with basic transformer knowledge who needs a practical, systems-level understanding to train, optimize, and serve LLMs in production.
- Nhu cầu tìm kiếm
- Find a comprehensive engineering-focused book that explains how LLMs work internally and how to build scalable training and inference systems.
- Góc tiếp cận
- Unlike most LLM books that focus on applications or research, this book provides a coherent engineering map connecting transformer mechanics to production infrastructure with concrete trade-off frameworks.
- Loại nội dung
- developer guide
Tóm tắt nhanh
- The book explains how transformers work internally, from self-attention to multi-head attention and feed-forward networks.
- It covers scaling laws linking model size, data, and compute to emergent capabilities.
- Readers learn about distributed training using data, tensor, and pipeline parallelism with ZeRO optimization.
- The guide details inference optimization techniques including KV cache management, FlashAttention, and quantization methods like GPTQ and AWQ.
- Production serving topics include continuous batching, load balancing, and multi-model API infrastructure.
Key topics: Transformer architecture, Tokenization and embeddings, Scaling laws, Distributed training parallelism, Fine-tuning and alignment (RLHF, DPO), LLM inference optimization, Quantization and compression, Production serving systems, GPU infrastructure and cluster networking, Open-source LLM ecosystem
Entities: Transformer, Byte-pair encoding (BPE), Self-attention, Scaling laws, ZeRO optimizer, FlashAttention, GPTQ, AWQ, vLLM, Hugging Face, NVLink, InfiniBand
Nhu cầu được đáp ứng
- Understanding the internal mechanics of transformer-based LLMs beyond surface-level descriptions.
- Designing and implementing distributed training pipelines with data, tensor, and pipeline parallelism.
- Optimizing inference latency and memory usage through attention kernels, quantization, and continuous batching.
- Selecting appropriate scaling parameters and trade-offs for model size, data, and compute budgets.
- Aligning LLMs for safety and performance using RLHF and DPO.
- Architecting robust production serving systems with load balancing and multi-model support.
Nên đọc nếu
- Machine learning engineers transitioning from traditional NLP to LLMs.
- AI engineers designing custom training or serving infrastructure.
- Software engineers integrating LLMs into production applications.
- Technical founders evaluating LLM architecture and deployment strategies.
- Computer science students specializing in AI systems and distributed computing.
- Researchers wanting a hands-on engineering perspective on modern LLMs.
Có thể không phù hợp nếu
- Readers looking for a high-level business strategy or ethical overview of LLMs without technical depth.
- Those seeking code-heavy tutorials or step-by-step implementation guides for specific frameworks.
- Complete beginners to machine learning who lack basic knowledge of neural networks and gradient descent.
- Readers primarily interested in NLP applications rather than the underlying engineering systems.
Câu hỏi thường gặp
Is this book suitable for beginners in machine learning?
No, it assumes basic knowledge of machine learning concepts and neural networks; it targets readers with software engineering or ML backgrounds.
Does the book cover practical implementation with code?
It focuses on engineering principles and system design rather than step-by-step tutorials, but includes architectural diagrams and trade-off analyses.
What is the main topic of the book?
The book covers the entire lifecycle of large language models: foundations, training, inference optimization, and infrastructure, from a systems engineering perspective.
Does it include distributed training techniques?
Yes, it dedicates a full chapter to distributed training with data, tensor, pipeline parallelism and the ZeRO optimizer.
What makes this book different from other LLM books?
It bridges the gap between research papers and production systems by explaining the mechanical and architectural principles behind each engineering decision.
Cretisoft Direct
Hỗ trợ sách số
Tải Partner
Gửi sách sau thanh toán
