Logistic Regression

This is a classification model and is used for real values attributes only, like all other linear discriminant methods.

Consider a problem with 2 classes.

We assume that the log likelihood ratio (log odds) is linear.

Proof:

Since sigmoid is the inverse of logit, we get:

Learning the Weights

Consider the data below:

To do so, we use an iterative technique known as Gradient Descent.

Gradient Descent

Note that the error function mentioned above has a single minimum (i.e. no local minima, just one global minimum).

Therefore, we get:

Pseudocode for Gradient Descent for Logistic Regression

# assume x_0 = 1

for j = 0...d:
    w_j = rand(-0.01, 0.01)
    repeat:
        for j = 0...d
            delta_w_j = 0
        for j = 1...N
            theta = 0
            for j = 0...d
                theta = theta + w_jx_j^t  # theta = wTx^t + w_0x_0 where x_0=1
            y = sigmoid(theta)
            delta_w_j = delta_w_j + (r^t-y)x_j^t
        for j = 0...d
            w_j = w_j + n * delta_w_j
    until convergence

Last updated