What is Gradient Descent?

Gradient descent minimizes a function by iteratively stepping in the direction of steepest descent.

The Intuition

Imagine standing blindfolded on a hilly landscape. You want the lowest point. Strategy: feel the slope, step downhill, repeat.

The Math

Given a loss function $L(\theta)$, update parameters as:

$$\theta = \theta - \alpha \nabla L(\theta)$$

where $\alpha$ is the learning rate.

A Simple Implementation

def gradient_descent(grad_fn, theta, lr=0.01, steps=100):
    for _ in range(steps):
        theta -= lr * grad_fn(theta)
    return theta

Key Takeaways

  • Learning rate too large: diverges. Too small: slow.
  • Vanilla GD uses the full dataset per step.
  • Mini-batch GD is the practical default in deep learning.