Model Selection

This refers to selecting an appropriate model for the task at hand.

For example, for regression, should we use Linear Regression? Polynomial Regression of degree 2? Polynomial Regression of degree 3? and so on.

Cross-Validation

In very low dimensions, we may be able to visualize/plot the data. We may be tempted to compute the squared errors for Linear Regression and Polynomial Regression and choose the one that performs better. However, this should not be done! This is because the squared error may be low on the training data, but may turn out to be extremely high on the test data.

Instead, we must perform cross-validation:

divide the dataset into training and validation sets
train each model on the training set
compute the error of the resulting g on the validation set
choose the model that minimizes the error on the validation set

Regularization

If the number of input variables is large (i.e. large dimension d), then Linear Regression learns a lot of coefficients. Sometimes, these coefficients can be absurdly large or small. This is a sign of overfitting.

In such cases, we can set some coefficients to 0, to simplify g.

We must find the hypothesis (i.e. the linear function) that minimizes the regularized error function given by:

$E^1=error\, on\, data + \lambda * (model\, complexity)$ where 'error on data' can be squared error, $\lambda$ is a tunable parameter called the regularization parameter and the model complexity (for a linear function) can be given by $\sum_{i=1}^d |w_i|$ . The value of $\lambda$ can be a default value or can be determined using cross-validation.

It can be shown that the hypothesis that maximizes $E^1$ is the MAP hypothesis, for a suitable prior.

Linear Discriminant Analysis

This is mainly used for classification problems.

Think of it as computing a 'score' for an example which is a weighted sum of the attributes.

In a generative approach, we attempt to learn/model distributions p(x|+) and p(x|-).

In a discriminative approach, we don't learn/model p(x|+) and p(x|-). We only attempt to discriminate between + and -.

PreviousBias and Variance of a Regression Algorithm NextLogistic Regression

Last updated 4 years ago