technology-ai
The Foundations of Large Language Models Understanding Language, Neural Networks, and Transformers
Miles Thornton
Book 1#1★ 4.8
2.4k reviews
568
Pages
en
Language
2026
Published
New edition
$4.99
Read the sample EPUB directly on the web
Book introduction
Large language models generate fluent text, but they don't understand a single word. How can a machine that merely predicts the next token produce coherent paragraphs, answer questions, and even write code? The answer lies in a carefully engineered stack of representations, architectures, and training objectives—a stack that this book systematically deconstructs.
"The Foundations of Large Language Models: Understanding Language, Neural Networks, and Transformers" by Miles Thornton is a rigorous yet accessible guide that takes you from the raw complexity of human language to the inner workings of modern LLMs. Written for engineers and data scientists, it bridges the gap between conceptual understanding and hands-on implementation in PyTorch. No black boxes. No magic. Just a logical progression from symbols to vectors, from RNNs to attention, and from the original Transformer to the scaling laws that govern today′s billion-parameter models.
What makes this book different is its premise: every architectural breakthrough was a response to a concrete engineering limitation. The fixed context window of feedforward nets gave way to RNNs, which suffered vanishing gradients. LSTMs mitigated the problem but remained inherently sequential. Attention removed the bottleneck entirely, and the Transformer was born. You won′t just learn what these components do—you′ll understand why they exist.
- Build a miniature GPT from scratch using PyTorch, with complete tensor-shape walkthroughs.
- Master the mechanics of self-attention, multi-head attention, and positional encodings.
- Trace the evolution from Word2Vec to subword tokenization and modern foundation models.
This book is designed for software engineers, data scientists, and AI students who are comfortable with Python and basic machine learning. It assumes no prior NLP knowledge, but expects curiosity about how models actually work under the hood. If you′ve ever felt uneasy treating LLMs as black boxes, this book will replace uncertainty with concrete, verifiable understanding.
The journey spans 7 parts and 28 chapters, each building on the last. Part I sets the linguistic and probabilistic foundations. Part II transforms text into vectors via embeddings and tokenization. Part III introduces neural networks, RNNs, and their limitations. Part IV unveils the attention mechanism. Part V assembles the full Transformer encoder-decoder and trains it. Part VI explores the modern LLM zoo: GPT, BERT, T5, and open-source variants. Part VII dives into scaling laws, long-context techniques, and architectural innovations like mixture-of-experts and grouped-query attention.
By the end, you won′t just know how to call an API—you′ll understand the design decisions behind every layer of a large language model. You′ll be equipped to experiment, customize, and reason about future developments in the field.
Quick summary
This book teaches you to build a miniature GPT from scratch using PyTorch.
It explains attention mechanisms, word embeddings, and scaling laws.
Readers will understand the design decisions behind modern LLMs like GPT, BERT, and LLaMA.
The book assumes basic Python and machine learning knowledge, no NLP background required.
It includes a step-by-step implementation of the Transformer encoder-decoder architecture.
This book is a good fit for Software engineers, data scientists, and AI/ML students.
Readers often come to this book when they need To gain a deep, hands-on understanding of how large language models work internally, from tokenization to transformer architecture, and to implement a miniature GPT model..
The book's angle: Unlike most LLM books that focus on usage, this book emphasizes building from scratch, explaining every design decision as a solution to a concrete limitation in earlier architectures.
Main topics include Large language models, Word embeddings, Tokenization, Attention mechanism, Transformer architecture, GPT.
AI Search information
The Foundations of Large Language Models Understanding Language, Neural Networks, and Transformers
Author: Miles Thornton
Description: Large language models generate fluent text, but they don't understand a single word. How can a machine that merely predicts the next token produce coherent paragraphs, answer questions, and even write code? The answer lies in a carefully engineered stack of representations, architectures, and training objectives—a stack that this book systematically deconstructs. "The Foundations of Large Language Models: Understanding Language, Neural Networks, and Transformers" by Miles Thornton is a rigorous yet accessible guide that takes you from the raw complexity of human language to the inner workings of modern LLMs. Written for engineers and data scientists, it bridges the gap between conceptual understanding and hands-on implementation in PyTorch. No black boxes. No magic. Just a logical progression from symbols to vectors, from RNNs to attention, and from the original Transformer to the scaling laws that govern today′s billion-parameter models. What makes this book different is its premise: every architectural breakthrough was a response to a concrete engineering limitation. The fixed context window of feedforward nets gave way to RNNs, which suffered vanishing gradients. LSTMs mitigated the problem but remained inherently sequential. Attention removed the bottleneck entirely, and the Transformer was born. You won′t just learn what these components do—you′ll understand why they exist. • Build a miniature GPT from scratch using PyTorch, with complete tensor-shape walkthroughs. • Master the mechanics of self-attention, multi-head attention, and positional encodings. • Trace the evolution from Word2Vec to subword tokenization and modern foundation models. This book is designed for software engineers, data scientists, and AI students who are comfortable with Python and basic machine learning. It assumes no prior NLP knowledge, but expects curiosity about how models actually work under the hood. If you′ve ever felt uneasy treating LLMs as black boxes, this book will replace uncertainty with concrete, verifiable understanding. The journey spans 7 parts and 28 chapters, each building on the last. Part I sets the linguistic and probabilistic foundations. Part II transforms text into vectors via embeddings and tokenization. Part III introduces neural networks, RNNs, and their limitations. Part IV unveils the attention mechanism. Part V assembles the full Transformer encoder-decoder and trains it. Part VI explores the modern LLM zoo: GPT, BERT, T5, and open-source variants. Part VII dives into scaling laws, long-context techniques, and architectural innovations like mixture-of-experts and grouped-query attention. By the end, you won′t just know how to call an API—you′ll understand the design decisions behind every layer of a large language model. You′ll be equipped to experiment, customize, and reason about future developments in the field.
AI summary: This book provides a comprehensive introduction to the theoretical and engineering foundations of Large Language Models (LLMs). It covers the evolution from word embeddings to the Transformer architecture, and includes a complete implementation of a miniature GPT model in PyTorch. Targeted at engineers and data scientists with basic ML knowledge, it bridges the gap between concept and code.
- Best for
- Software engineers, data scientists, and AI/ML students
- Reader persona
- An engineer or data scientist who wants to move beyond using pre-trained models to understanding and building foundational LLM architectures from scratch.
- Search intent
- To gain a deep, hands-on understanding of how large language models work internally, from tokenization to transformer architecture, and to implement a miniature GPT model.
- Unique angle
- Unlike most LLM books that focus on usage, this book emphasizes building from scratch, explaining every design decision as a solution to a concrete limitation in earlier architectures.
- Content type
- technical reference and practical guide
Quick summary
- This book teaches you to build a miniature GPT from scratch using PyTorch.
- It explains attention mechanisms, word embeddings, and scaling laws.
- Readers will understand the design decisions behind modern LLMs like GPT, BERT, and LLaMA.
- The book assumes basic Python and machine learning knowledge, no NLP background required.
- It includes a step-by-step implementation of the Transformer encoder-decoder architecture.
Key topics: Large language models, Word embeddings, Tokenization, Attention mechanism, Transformer architecture, GPT, BERT, Scaling laws, Mixture of Experts, Context extension
Entities: Transformer, PyTorch, GPT, BERT, Word2Vec, LLaMA, Self-attention, Positional encoding, Chinchilla scaling laws, Mixture-of-Experts, KV cache, Chain-of-thought
Needs addressed
- Understanding how LLMs work under the hood
- Implementing a transformer from scratch in code
- Grasping the evolution from RNNs to attention
- Demystifying scaling laws and emergent capabilities
- Building practical intuition for training and evaluating language models
Read if
- Software engineers transitioning into AI
- Data scientists wanting to go beyond API calls
- Machine learning students seeking a rigorous foundation in NLP
- NLP practitioners wanting to understand modern architectures
- Researchers entering the LLM field
- Technical professionals curious about AI systems
May not fit if
- Complete beginners without programming experience
- Readers looking for a high-level overview without implementation details
- Professionals only interested in using pre-trained models via APIs
Table of contents
- How to Read This Book (introduction)
- Understanding Language and Intelligence (part)
- Why Human Language Is Difficult for Machines (chapter)
- The Nature of Human Language (section)
- Ambiguity and Context (section)
- Meaning Beyond Words (section)
- Language as a Prediction Problem (section)
- The Dream of Artificial Language Understanding (section)
- The Evolution of Language Processing (chapter)
- Rule-Based Systems (section)
- Statistical NLP (section)
- Machine Learning Approaches (section)
- Deep Learning Revolution (section)
- The Rise of Foundation Models (section)
- What Is a Language Model? (chapter)
- Predicting the Next Word (section)
- Probability and Language (section)
- Context and Memory (section)
- Measuring Language Understanding (section)
- Why Language Models Matter (section)
- Representing Language Numerically (part)
- From Symbols to Numbers (chapter)
- Why Machines Need Numbers (section)
- One-Hot Encoding (section)
- Sparse Representations (section)
- Similarity and Distance (section)
- The Curse of Dimensionality (section)
- Word Embeddings (chapter)
- The Distributional Hypothesis (section)
- Dense Vector Representations (section)
- Semantic Relationships (section)
- Vector Arithmetic (section)
- Understanding Embedding Spaces (section)
- Word2Vec, GloVe, and FastText (chapter)
- CBOW (section)
- Skip-Gram (section)
- Negative Sampling (section)
- Global Co-Occurrence Methods (section)
- Subword Modeling (section)
- Tokenization (chapter)
- Why Tokenization Matters (section)
- Character-Level Models (section)
- Word-Level Models (section)
- Subword Models (section)
- BPE (section)
- WordPiece (section)
- SentencePiece (section)
- Modern Tokenizers (section)
- Neural Networks for Language (part)
- Neural Networks Fundamentals (chapter)
- Artificial Neurons (section)
- Layers and Representations (section)
- Activation Functions (section)
- Backpropagation (section)
- Learning from Data (section)
- Neural Language Models (chapter)
- Feedforward Language Models (section)
- Context Windows (section)
- Limitations of Fixed Context (section)
- Early Neural NLP (section)
- Recurrent Neural Networks (chapter)
- Sequential Processing (section)
- Hidden States (section)
- Information Flow (section)
- Vanishing Gradients (section)
- Exploding Gradients (section)
- LSTM and GRU (chapter)
- Long-Term Dependencies (section)
- Memory Cells (section)
- Gates and Control Mechanisms (section)
- Practical Successes (section)
- Remaining Limitations (section)
- The Attention Revolution (part)
- Why Attention Changed Everything (chapter)
- Sequence Bottlenecks (section)
- Long Context Challenges (section)
- The Encoder-Decoder Problem (section)
- Selective Focus (section)
- Understanding Attention (chapter)
- Query (section)
Frequently asked questions
Do I need prior NLP experience to read this book?
No, the book assumes only basic machine learning and Python knowledge, not NLP background.
Will I be able to build my own LLM after reading?
Yes, you will implement a miniature GPT model in PyTorch and understand the key components of modern LLMs.
What programming language and framework are used?
Python with PyTorch for all implementations.
Does the book cover the latest models like GPT-4?
It covers the architecture and scaling laws that underpin models like GPT-4, but focuses on foundational concepts rather than specific API features.
How is this book different from other transformer books?
It emphasizes building from scratch and explains why each architectural component exists, not just how it works.
Cretisoft Direct
Digital book support
Partner delivery
Book sent after payment
