Gradient Descent
Last updated
Was this helpful?
Last updated
Was this helpful?
So we have our hypothesis function and we have a way of measuring how accurate it is. Now what we need is a way to automatically improve our hypothesis function. That's where gradient descent comes in.
To get the most accurate hypothesis, we must minimize the cost function. (This means that we have to find values for such that the cost function would have a minimum value).
To do so, we take the derivative (the line tangent to a function) of our cost function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down that derivative by the parameter , called the learning rate.
The gradient descent equation is:
repeat until convergence:{
}
for j=0 and j=1
If is too small, gradient descent is slow, i.e. it takes longer to reach the minimum value of J
If is too large, we may overshoot the minimum, i.e. it may fail to converge or diverge
The equation for gradient descent for Linear Regression in One Variable can be obtained by substituting the hypothesis function and the cost function in the gradient descent formula above. We get:
repeat until convergence:{
}
The Gradient Descent used here is also called Batch Gradient Descent because it uses all the training examples in each step.