Optimization

What is Gradient Descent? Gradient descent minimizes a function by iteratively stepping in the direction of steepest descent. The Intuition Imagine standing blindfolded on a hilly landscape. You want the lowest point. Strategy: feel the slope, step downhill, repeat. The Math Given a loss function $L(\theta)$, update parameters as: $$\theta = \theta - \alpha \nabla L(\theta)$$ where $\alpha$ is the learning rate. A Simple Implementation def gradient_descent(grad_fn, theta, lr=0.01, steps=100): for _ in range(steps): theta -= lr * grad_fn(theta) return theta Key Takeaways Learning rate too large: diverges. Too small: slow. Vanilla GD uses the full dataset per step. Mini-batch GD is the practical default in deep learning.