Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Gradient Descent into Madness - Building an LLM from scratch

Feb 16, 2024 - news.bensbites.co
The article discusses the process of building a language model from scratch, focusing on the concept of automatic differentiation. The author starts by creating a tensor, a mathematical object that can be differentiated. The author then explains the concept of scalar derivatives and how to differentiate scalars. The article then delves into the concept of chaining functions together and how to calculate derivatives of nested functions using the Chain Rule. The author also explains how to represent these equations graphically using a directed acyclic graph (DAG), which simplifies the process of finding derivatives.

The author then translates this graphical representation into code, creating a class for the tensor and defining functions for addition, subtraction, and multiplication. The author also adds a function to calculate the derivatives for each of the function arguments. The author then tests the code by creating tensors and performing operations on them. The author concludes by explaining how to find all paths from the tensor to be differentiated to the input tensors, and how to calculate the derivative of a variable with respect to its inputs.

Key takeaways:

  • The article provides a detailed guide on how to build a language model from scratch, focusing on automatic differentiation.
  • The author explains how to create a tensor and perform simple operations like addition, subtraction, and multiplication.
  • The article also explains how to calculate scalar derivatives and chain functions together, using the chain rule from calculus.
  • The author demonstrates how to turn the equations into a graph and label each edge with the appropriate derivative, then find every path from the output to the input variable and multiply the derivatives together.
View Full Article

Comments (0)

Be the first to comment!