Probabilistic Modeling

This refers to models of data generation i.e. generative models. They model where the data comes from.

Consider spam classification. We would learn the probability distribution for the examples in the spam class as well as the probability distribution for the examples in the non-spam (ham) class.

Given a new sample x, we must then calculate P(x is spam). By default, we label it as spam if P(x is spam)>=0.5 i.e. if P(x is spam)>=P(x is ham).

More generally put, we must compute P(C|X) i.e. the probability of a class C given a training example X i.e. the probability of X belonging to C.

Bayes' Rule

According to Bayes' Rule,

P(C) is the prior probability, P(X|C) is the likelihood probability of X being generated from C and P(X) is known as the evidence. P(C|X) is called the posterior probability.

C is the hypothesis and X is the data.

The prior probability is computed without having seen the data X.

For example:

Note that if we have a uniform prior (distribution) i.e. all the hypotheses are equally likely, the MAP and ML hypotheses are the same.

Continuous Distributions and Bayes' Rule

For continuous distributions, i.e. for a continuous random variable X, we cannot compute P(X) directly, because X doesn't hold a fixed set of discrete values.

Instead, we compute the pdf(X) i.e. probability distribution function of X. We denote it as p(X). It is, visually speaking, the height of the curve at X.

Say we have two probability distributions, one for mens' heights and the other for womens' heights. They are assumed to be Normal/Gaussian distributions.

For continuous random variables X, the Bayes' Rule is as follows:

Last updated