Choosing the Learning Rate α

If $α$ is sufficiently small, J( $θ$ ) will decrease with every iteration. This is how we know for sure that the gradient descent is working.

However, if the gradient descent is taking too long to converge, $α$ may be too small.

On the other hand, if $α$ is too large, we may overshoot the minimum, causing the cost function to increase instead of decreasing.

So, to choose an efficient learning rate, plot a graph taking J( $θ$ ) on the Y-axis and # of iterations on the X-axis for different values of $α$ (Say 0.01, 0.03, 0.1, 0.3, 1 i.e. increasing threefold) to see which value of $α$ makes the gradient descent converge the quickest.

PreviousMean Normalization NextPolynomial Regression

Last updated 5 years ago

Was this helpful?