How Transformer LLMs Work [Free Course]

Maarten and I teamed up with Andrew Ng to bring you the most accessible and up-to-date intro to Transformers yet!

Feb 10, 2025

Enroll for free now: https://bit.ly/4aRnn7Z
Github Repo: https://github.com/HandsOnLLM/Hands-On-Large-Language-Models

We're ecstatic to bring you "How Transformer LLMs Work" -- a free course with ~90 minutes of video, code, and crisp visuals and animations that explain the modern Transformer architecture, tokenizers, embeddings, and mixture-of-expert models.

Maarten Grootendorst

and I have developed a lot of the visual language over the last several years (tens of thousands of iterations for hundreds of figures) for the book. This was informed by many incredible colleagues at Cohere, C4AI, and the open source and open science ML community. But to have an opportunity to collaborate with the legendary Andrew Ng and the team at DeepLearning.ai we took them to the next level with animations and a concise narrative meant to enable technical learners to pick up an ML paper and understand the architecture description.

In this course, you'll learn how a transformer network architecture that powers LLMs works. You'll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture.

Key topics covered in this course include:

The evolution of how language has been represented numerically, from the Bag-of-Words model through Word2Vec embeddings to the transformer architecture that captures word meanings in full context.
How LLM inputs are broken down into tokens, which represent words or pieces before they are sent to the language model.
The details of a transformer and the three main stages, consisting of tokenization and embedding, the stack of transformer blocks, and the language model head.
The details of the transformer block, including attention, which calculates relevance scores followed by the feedforward layer, which incorporates stored information learned in training.
How cached calculations make transformers faster, how the transformer block has evolved over the years since the original paper was released, and how they continue to be widely used.
Explore an implementation of recent models in the Hugging Face transformer library.

By the end of this course, you’ll have a deep understanding of how LLMs process language and you'll be able to read through papers describing models and understand the details that are used to describe these architectures. This intuition will help improve your approach to building LLM applications.

We hope you enjoy it!

Language Models & Co.

Discussion about this post