# Gradient Descent

So we have our hypothesis function and we have a way of measuring how accurate it is. Now what we need is a way to automatically improve our hypothesis function. That's where gradient descent comes in.

To get the most accurate hypothesis, we must **minimize the cost function**. (This means that we have to find values for $$θ\_0, θ\_1$$ such that the cost function would have a minimum value).

To do so, we take the derivative (the line tangent to a function) of our cost function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down that derivative by the parameter $$α$$, called the **learning rate**.

The gradient descent equation is:

repeat until convergence:{

$$θ\_j:= θ\_j − α ∂/∂θ\_j J(θ\_0,θ\_1)$$

}

for j=0 and j=1

* If $$α$$ is too small, gradient descent is slow, i.e. it takes longer to reach the minimum value of J
* If $$α$$ is too large, we may overshoot the minimum, i.e. it may fail to converge or diverge

The equation for gradient descent for *Linear Regression in One Variable* can be obtained by substituting the hypothesis function and the cost function in the gradient descent formula above. We get:

repeat until convergence:{

$$θ\_0:=θ\_0 − α (1/m) ∑\_{i=1}^{m}(h\_θ(x^{(i)})−y^{(i)})$$

$$θ\_1:=θ\_1 − α (1/m) ∑\_{i=1}^{m}((h\_θ(x^{(i)})−y^{(i)})x^{(i)})$$

}

The Gradient Descent used here is also called *Batch Gradient Descent* because it uses all the training examples in each step.
