# Model Selection

This refers to selecting an appropriate model for the task at hand.

For example, for regression, should we use Linear Regression? Polynomial Regression of degree 2? Polynomial Regression of degree 3? and so on.

## Cross-Validation

In very low dimensions, we may be able to visualize/plot the data. We may be tempted to compute the squared errors for Linear Regression and Polynomial Regression and choose the one that performs better. However, **this should not be done**! This is because the squared error may be low on the training data, but may turn out to be extremely high on the test data.

Instead, we must perform **cross-validation**:

* divide the dataset into training and validation sets
* train each model on the training set
* compute the error of the resulting g on the validation set
* choose the model that minimizes the error on the validation set

## Regularization

If the number of input variables is large (i.e. large dimension d), then Linear Regression learns a lot of coefficients. Sometimes, these coefficients can be absurdly large or small. This is a sign of overfitting.

In such cases, we can set some coefficients to 0, to simplify g.

We must find the hypothesis (i.e. the linear function) that minimizes the **regularized error function** given by:

$$E^1=error, on, data + \lambda \* (model, complexity)$$\
where 'error on data' can be squared error, $$\lambda$$ is a tunable parameter called the **regularization parameter** and the model complexity (for a linear function) can be given by $$\sum\_{i=1}^d |w\_i|$$.\
The value of $$\lambda$$ can be a default value or can be determined using cross-validation.

It can be shown that the hypothesis that maximizes $$E^1$$ is the MAP hypothesis, for a suitable prior.

## Linear Discriminant Analysis

This is mainly used for classification problems.

Think of it as computing a 'score' for an example which is a weighted sum of the attributes.

Say we have 3 classes $$C\_1,C\_2,C\_3$$.

The score for class i is given by $$g\_i(x|w\_i, w\_{i0}) = w\_{i2}x\_2+w\_{i1}x\_1+w\_{i0}$$ where $$x=\begin{bmatrix}x\_1\x\_2\end{bmatrix}$$.\
It can be computed using $$g\_i(x)=w\_i^Tx+w\_{i0}$$

Given $$g\_1(x), g\_2(x), g\_3(x)$$, we must predict the class for x based on the class that maximizes $$g\_i(x)$$ i.e. the $$argmax\_i, g\_i(x)$$.

We must learn a $$g\_i(x)$$ that can hopefully make accurate predictions on new examples. This **linear discriminant function** linearly separates the examples that belong to class i and don't belong to class i. (say examples above the line belong to the class and examples below the line do not).

![](/files/-M5-0Sn0w9FitSzu4EeV)

Consider a problem with two classes $$C\_1$$(+) and $$C\_2$$ (-).

In a **generative approach**, we attempt to learn/model distributions p(x|+) and p(x|-).

In a **discriminative approach**, we don't learn/model p(x|+) and p(x|-). We only attempt to discriminate between + and -.

Let $$y \equiv P(C\_1|x), , 1-y = P(C\_2|x)$$.

Choose class $$C\_1$$ if $$y\gt 0.5,\frac{y}{1-y}>1, , or ,, log(\frac{y}{1-y})>0$$. Otherwise, choose $$C\_2.$$

The **Logit** or **Log Odds** function is given by $$logit(y)=log(\frac{y}{1-y})$$ for 0\<y<1\
Its inverse is the logistic function, also called the **sigmoid** function i.e. $$sigmoid(z)=\frac{1}{1+e^{-z}}$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://vikram-bajaj.gitbook.io/cs-gy-6923-machine-learning/types-of-machine-learning/supervised-learning/bias-and-variance-of-a-regression-algorithm/model-selection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
