Machine Learning - Stanford - Coursera
1.0.0
1.0.0
  • Acknowledgements
  • Introduction
  • Linear Algebra Review
  • Types of Machine Learning
  • Supervised Learning
    • Linear Regression
      • Linear Regression in One Variable
        • Cost Function
        • Gradient Descent
      • Multivariate Linear Regression
        • Cost Function
        • Gradient Descent
        • Feature Scaling
        • Mean Normalization
        • Choosing the Learning Rate α
    • Polynomial Regression
      • Normal Equation
      • Gradient Descent vs. Normal Equation
Powered by GitBook
On this page

Was this helpful?

  1. Supervised Learning
  2. Linear Regression
  3. Linear Regression in One Variable

Gradient Descent

So we have our hypothesis function and we have a way of measuring how accurate it is. Now what we need is a way to automatically improve our hypothesis function. That's where gradient descent comes in.

To get the most accurate hypothesis, we must minimize the cost function. (This means that we have to find values for θ0,θ1θ_0, θ_1θ0​,θ1​ such that the cost function would have a minimum value).

To do so, we take the derivative (the line tangent to a function) of our cost function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down that derivative by the parameter ααα, called the learning rate.

The gradient descent equation is:

repeat until convergence:{

θj:=θj−α∂/∂θjJ(θ0,θ1)θ_j:= θ_j − α ∂/∂θ_j J(θ_0,θ_1)θj​:=θj​−α∂/∂θj​J(θ0​,θ1​)

}

for j=0 and j=1

  • If ααα is too small, gradient descent is slow, i.e. it takes longer to reach the minimum value of J

  • If ααα is too large, we may overshoot the minimum, i.e. it may fail to converge or diverge

The equation for gradient descent for Linear Regression in One Variable can be obtained by substituting the hypothesis function and the cost function in the gradient descent formula above. We get:

repeat until convergence:{

θ0:=θ0−α(1/m)∑i=1m(hθ(x(i))−y(i))θ_0:=θ_0 − α (1/m) ∑_{i=1}^{m}(h_θ(x^{(i)})−y^{(i)})θ0​:=θ0​−α(1/m)∑i=1m​(hθ​(x(i))−y(i))

θ1:=θ1−α(1/m)∑i=1m((hθ(x(i))−y(i))x(i))θ_1:=θ_1 − α (1/m) ∑_{i=1}^{m}((h_θ(x^{(i)})−y^{(i)})x^{(i)})θ1​:=θ1​−α(1/m)∑i=1m​((hθ​(x(i))−y(i))x(i))

}

The Gradient Descent used here is also called Batch Gradient Descent because it uses all the training examples in each step.

PreviousCost FunctionNextMultivariate Linear Regression

Last updated 5 years ago

Was this helpful?