Logistic Regression as a Neural Network (Neural Network Basics)

Logistic Regression is a classification algorithm. It is commonly used for Binary Classification.

Binary Classification is the categorization of items into one of two categories/classes.

Some Notations

The number of training examples is denoted by m. Training examples are denoted as (x,y) tuples where x is independent (features/attributes) and y is dependent (label).

Say we have a pxq image. It will have pxqx3 pixels (for an RGB image). These pxqx3 values are stored in the form of a single column vector x having pxqx3 rows. This is how a training image is represented.

We then create a matrix X that stores all the m training images as vectors (column-wise). So it will have m columns, each with pxqx3 rows.

Another vector Y is used to store the m training labels, column-wise. So it has dimensions 1xm.

Logistic Regression Equation

We have a vector of features x. Given x, we need to predict the probability $\hat{y}$ that y=1.

We also have parameters w and b.

We denote $\hat{y}$ as follows:

$\hat{y} = \sigma(w^T x + b)$

where $\sigma$ is the sigmoid activation function that restricts the value between 0 and 1.

Logistic Regression Cost Function

To train the parameters w and b, we need a cost function.

Let us consider the Loss Function L that denotes the error in the prediction for a given example as follows:

$L(\hat{y}, y) = - (y log \hat{y} + (1-y) log (1-\hat{y}))$

Now, we denote the cost function as follows:

$J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)})$

i=1 to m denote the m training examples.

Our aim now is to find w and b which minimize the cost function J. This is done using Gradient Descent.

Gradient Descent

Gradient Descent helps us find the global minimum of a function, thereby letting us find optimal w and b values that minimize J.

It is denoted as follows:

Repeat {

$w := w - \alpha\frac{\partial J(w,b)}{\partial w}$

$b := b - \alpha\frac{\partial J(w,b)}{\partial b}$

}

where $\alpha$ is the learning rate.

PreviousIntroduction to Deep Learning NextShallow Neural Network

Last updated 5 years ago

Was this helpful?