# Probabilistic Modeling

This refers to models of data generation i.e. **generative models**. They model where the data comes from.

Consider spam classification. We would learn the probability distribution for the examples in the spam class as well as the probability distribution for the examples in the non-spam (ham) class.

Given a new sample x, we must then calculate P(x is spam).\
By default, we label it as spam if P(x is spam)>=0.5 i.e. if P(x is spam)>=P(x is ham).

More generally put, we must compute P(C|X) i.e. the probability of a class C given a training example X i.e. the probability of X belonging to C.

## Bayes' Rule

According to Bayes' Rule,

$$P(C|X) = \frac{P(C)P(X|C)}{P(X)}$$

P(C) is the **prior** probability, P(X|C) is the **likelihood** probability of X being generated from C and P(X) is known as the **evidence**. P(C|X) is called the **posterior** probability.

C is the **hypothesis** and X is the **data**.

The prior probability is computed without having seen the data X.

For example:

X - I am late to class\
$$C\_1$$ - I was kidnapped by Martians\
$$C\_2$$ - I was thinking about research and I lost track of time

Say, P($$C\_1$$) = 0.00000...1. We assume that these are the only two hypotheses. Therefore, P($$C\_2$$)=0.999999....9.\
The probability that I was late to class *given* that the Martians kidnapped me is pretty high. So, say P(X|$$C\_1$$)=0.97. Now, the probability that I was late to class *given* that I was thinking about research and lost track of time is also pretty high. So, say P(X|$$C\_2$$)=0.5.

Bayes' Rule aims to find the most probable hypothesis. In other words, given X, we aim to choose the hypothesis that maximizes P(C|X) i.e. $$argmax\_C$$ P(C|X). This is also called the **Maximum a-posteriori (MAP)** hypothesis, where a-posteriori means 'after' seeing the data.

$$argmax\_C P(C|X) = argmax\_C \frac{P(C)P(X|C)}{P(X)} = argmax\_C P(C)P(X|C)$$ since the denominator is a constant.

In our example, $$P(C\_1|X) = P(C\_1)P(X|C\_1) = 0.0000...1 \* 0.97$$ and $$P(C\_2|X) = P(C\_2)P(X|C\_2) = 0.999...9 \* 0.5$$

Therefore, we choose hypothesis $$C\_2$$.

The **Maximum Likelihood** (ML) hypothesis is given by $$argmax\_C P(X|C)$$.

Note that if we have a **uniform prior** (distribution) i.e. all the hypotheses are equally likely, the MAP and ML hypotheses are the same.

### Continuous Distributions and Bayes' Rule

For continuous distributions, i.e. for a continuous random variable X, we cannot compute P(X) directly, because X doesn't hold a fixed set of discrete values.

Instead, we compute the pdf(X) i.e. probability distribution function of X. We denote it as p(X). It is, visually speaking, the height of the curve at X.

Say we have two probability distributions, one for mens' heights and the other for womens' heights. They are assumed to be Normal/Gaussian distributions.

Say, $$C\_1: man, C\_2: woman; P(C\_1) = P(C\_2) = 0.5$$. So, ML hypothesis = MAP hypothesis.

For continuous random variables X, the Bayes' Rule is as follows:

$$P(C|X) = \frac{P(C)p(X|C)}{p(X)}$$

The MAP hypothesis = $$argmax\_C P(C|X) = argmax\_C P(C)p(X|C)$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://vikram-bajaj.gitbook.io/cs-gy-6923-machine-learning/types-of-machine-learning/supervised-learning/probabilistic-modeling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
