Bayesian Approach to Parameter Estimation

Treat Θ\Theta as a random variable with prior p(Θ)p(\Theta).

According to Bayes' Rule, p(ΘX)=p(Θ)p(XΘ)p(X)p(\Theta|X) = \frac{p(\Theta)p(X|\Theta)}{p(X)}

  • The ML estimate is given by:

    ΘML=argmaxΘp(XΘ)\Theta_{ML} = argmax_\Theta \, p(X|\Theta)

  • The MAP estimate is given by:

    ΘMAP=argmaxΘp(XΘ)p(Θ)\Theta_{MAP} = argmax_\Theta \,p(X|\Theta)p(\Theta)

  • The Bayes Estimate is given by:

    ΘBAYES=E[ΘX]=ΘΘp(ΘX)dΘ\Theta_{BAYES'} = E[\Theta|X] = \int_\Theta \Theta p(\Theta|X) d\Theta (the integral becomes a summation for discrete values)

Example with a Discrete Prior on Θ\Theta

Consider a parameterized distribution uniform on [0,Θ][0,\Theta].

Say the discrete prior on Θ\Theta is given by:

P(Θ=1)=2/3P(\Theta=1) = 2/3

P(Θ=2)=1/3P(\Theta=2) = 1/3

Suppose X={0.5,1.3,0.7} Given X, we know that P(ΘX)=0P(\Theta|X) = 0 and therefore, P(Θ=2X)=1P(\Theta=2|X)=1. So, the ML, MAP and BAYES' hypotheses are all 2.

Now, suppose X={0.5,0.7,0.1} p(XΘ=1)=13=1p(X|\Theta=1) = 1^3 = 1

p(XΘ=2)=(1/2)3=1/8p(X|\Theta=2) = (1/2)^3 = 1/8

So, p(X)=P(Θ=1)p(XΘ=1)+P(Θ=2)p(XΘ=2)=51/72p(X) = P(\Theta=1)p(X|\Theta=1) + P(\Theta=2)p(X|\Theta=2) = 51/72

Therefore, P(Θ=1X)=p(XΘ=1)P(Θ=1)p(X)=48/51P(\Theta=1|X) = \frac{p(X|\Theta=1)P(\Theta=1)}{p(X)} = 48/51 and P(Θ=2X)=3/51P(\Theta=2|X) = 3/51

In this case, the MAP hypothesis is 1 and the ML hypothesis is 1. The Bayes' hypothesis can be computed as E[ΘX]=1(48/51)+2(3/51)=54/51=1.06E[\Theta|X] = 1*(48/51) + 2*(3/51) = 54/51 = 1.06

The posterior density of x given X is given by:

p(x=0.82X)=p(Θ=1X)p(x=0.82Θ=1)+p(Θ=2X)p(x=0.82Θ=2)p(x=0.82|X) =p(\Theta=1|X)p(x=0.82|\Theta=1) + p(\Theta=2|X)p(x=0.82|\Theta=2)

=(48/51)1+(3/51)(1/2)=99/102=(48/51)*1 + (3/51)*(1/2) = 99/102

Example with a Continuous Prior on Θ\Theta

Assume the data X is drawn from a Gaussian with a known variance σ2\sigma^2 and an unknown mean μ\mu (this is now the Θ\Theta).

Assume a Gaussian prior on Θ\Theta i.e. ΘN(μ0,σ02)\Theta \sim N(\mu_0, \sigma_0^2) and μ0,σ02\mu_0, \, \sigma_0^2 are known.

Then, generate X from N(Θ,σ2)N(\Theta, \sigma^2) (this Θ\Theta is the mean of the Gaussian from which X was chosen. It is what we need to estimate.)

Given X, we have:

ΘML=m(i.e.thesamplemean)=txtN\Theta_{ML} = m\, (i.e.\, the\,\, sample\,\, mean) = \frac{\sum_t x^t}{N}

ΘMAP=N/σ2N/σ2+1/σ02m+1/σ02N/σ2+1/σ02μ0\Theta_{MAP} = \frac{N/\sigma^2}{N/\sigma^2 + 1/\sigma_0^2}m + \frac{1/\sigma_0^2}{N/\sigma^2 + 1/\sigma_0^2}\mu_0

ΘBAYES=ΘMAP!\Theta_{BAYES'} = \Theta_{MAP}!

As NN\rightarrow\infty, m dominates the weighted sum of m and μ0\mu_0.

Estimates of Mean and Variance of a Distribution (not just Gaussian)

The ML estimate for the mean is m i.e. the sample mean.

The ML estimate of variance is t(xtm)2N\frac{\sum_t (x^t-m)^2}{N} (this is biased since E[σ2]<σ2E[\sigma^2] < \sigma^2).

Note that the estimate for variance is lower than the actual value because we use the sample mean m to compute it instead of using the actual mean.

However, t(xtm)2N1\frac{\sum_t (x^t-m)^2}{N-1} is an unbiased estimate.

Last updated