# Logistic Regression as a Neural Network (Neural Network Basics)

Logistic Regression is a classification algorithm. It is commonly used for Binary Classification.

Binary Classification is the categorization of items into one of two categories/classes.

## Some Notations

The number of training examples is denoted by **m**. Training examples are denoted as (x,y) tuples where x is independent (features/attributes) and y is dependent (label).

Say we have a pxq image. It will have pxqx3 pixels (for an RGB image). These pxqx3 values are stored in the form of a single column vector **x** having pxqx3 rows. This is how a training image is represented.

We then create a matrix **X** that stores all the m training images as vectors (column-wise). So it will have m columns, each with pxqx3 rows.

Another vector **Y** is used to store the m training labels, column-wise. So it has dimensions 1xm.

## Logistic Regression Equation

We have a vector of features **x**. Given x, we need to predict the probability $$\hat{y}$$ that y=1.

We also have parameters **w** and **b**.

We denote $$\hat{y}$$ as follows:

$$\hat{y} = \sigma(w^T x + b)$$

where $$\sigma$$ is the sigmoid activation function that restricts the value between 0 and 1.

## Logistic Regression Cost Function

To train the parameters w and b, we need a cost function.

Let us consider the Loss Function **L** that denotes the error in the prediction for a given example as follows:

$$L(\hat{y}, y) = - (y log \hat{y} + (1-y) log (1-\hat{y}))$$

Now, we denote the cost function as follows:

$$J(w, b) = \frac{1}{m}\sum\_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)})$$

i=1 to m denote the m training examples.

Our aim now is to find w and b which minimize the cost function J. This is done using **Gradient Descent**.

## Gradient Descent

Gradient Descent helps us find the global minimum of a function, thereby letting us find optimal w and b values that minimize J.

It is denoted as follows:

Repeat {

$$w := w - \alpha\frac{\partial J(w,b)}{\partial w}$$

$$b := b - \alpha\frac{\partial J(w,b)}{\partial b}$$

}

where $$\alpha$$ is the **learning rate**.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://vikram-bajaj.gitbook.io/deep-learning-specialization-coursera/chapter1/logistic-regression-as-a-neural-network.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
