The first question I had to answer was: why is backward differentiation necessary at all? In a neural network, the loss is computed at the very end of the forward pass. To update the weights, we need ...