Logistic Regression as a Neural Network (Neural Network Basics)
Logistic Regression is a classification algorithm. It is commonly used for Binary Classification.
Binary Classification is the categorization of items into one of two categories/classes.
Some Notations
The number of training examples is denoted by m. Training examples are denoted as (x,y) tuples where x is independent (features/attributes) and y is dependent (label).
Say we have a pxq image. It will have pxqx3 pixels (for an RGB image). These pxqx3 values are stored in the form of a single column vector x having pxqx3 rows. This is how a training image is represented.
We then create a matrix X that stores all the m training images as vectors (column-wise). So it will have m columns, each with pxqx3 rows.
Another vector Y is used to store the m training labels, column-wise. So it has dimensions 1xm.
Logistic Regression Equation
We have a vector of features x. Given x, we need to predict the probability y^ that y=1.
We also have parameters w and b.
We denote y^ as follows:
y^=σ(wTx+b)
where σ is the sigmoid activation function that restricts the value between 0 and 1.
Logistic Regression Cost Function
To train the parameters w and b, we need a cost function.
Let us consider the Loss Function L that denotes the error in the prediction for a given example as follows:
L(y^,y)=−(ylogy^+(1−y)log(1−y^))
Now, we denote the cost function as follows:
J(w,b)=m1∑i=1mL(y^(i),y(i))
i=1 to m denote the m training examples.
Our aim now is to find w and b which minimize the cost function J. This is done using Gradient Descent.
Gradient Descent
Gradient Descent helps us find the global minimum of a function, thereby letting us find optimal w and b values that minimize J.
It is denoted as follows:
Repeat {
w:=w−α∂w∂J(w,b)
b:=b−α∂b∂J(w,b)
}
where α is the learning rate.
Last updated