The author then translates this graphical representation into code, creating a class for the tensor and defining functions for addition, subtraction, and multiplication. The author also adds a function to calculate the derivatives for each of the function arguments. The author then tests the code by creating tensors and performing operations on them. The author concludes by explaining how to find all paths from the tensor to be differentiated to the input tensors, and how to calculate the derivative of a variable with respect to its inputs.
Key takeaways:
- The article provides a detailed guide on how to build a language model from scratch, focusing on automatic differentiation.
- The author explains how to create a tensor and perform simple operations like addition, subtraction, and multiplication.
- The article also explains how to calculate scalar derivatives and chain functions together, using the chain rule from calculus.
- The author demonstrates how to turn the equations into a graph and label each edge with the appropriate derivative, then find every path from the output to the input variable and multiply the derivatives together.