Treat Θ as a random variable with prior p(Θ).
According to Bayes' Rule, p(Θ∣X)=p(X)p(Θ)p(X∣Θ)
The ML estimate is given by:
ΘML=argmaxΘp(X∣Θ)
The MAP estimate is given by:
ΘMAP=argmaxΘp(X∣Θ)p(Θ)
The Bayes Estimate is given by:
ΘBAYES′=E[Θ∣X]=∫ΘΘp(Θ∣X)dΘ (the integral becomes a summation for discrete values)
Example with a Discrete Prior on
Θ
Consider a parameterized distribution uniform on [0,Θ].
Say the discrete prior on Θ is given by:
P(Θ=1)=2/3
P(Θ=2)=1/3
Suppose X={0.5,1.3,0.7}
Given X, we know that P(Θ∣X)=0 and therefore, P(Θ=2∣X)=1. So, the ML, MAP and BAYES' hypotheses are all 2.
Now, suppose X={0.5,0.7,0.1}
p(X∣Θ=1)=13=1
p(X∣Θ=2)=(1/2)3=1/8
So, p(X)=P(Θ=1)p(X∣Θ=1)+P(Θ=2)p(X∣Θ=2)=51/72
Therefore, P(Θ=1∣X)=p(X)p(X∣Θ=1)P(Θ=1)=48/51 and P(Θ=2∣X)=3/51
In this case, the MAP hypothesis is 1 and the ML hypothesis is 1.
The Bayes' hypothesis can be computed as E[Θ∣X]=1∗(48/51)+2∗(3/51)=54/51=1.06
The posterior density of x given X is given by:
p(x=0.82∣X)=p(Θ=1∣X)p(x=0.82∣Θ=1)+p(Θ=2∣X)p(x=0.82∣Θ=2)
=(48/51)∗1+(3/51)∗(1/2)=99/102
Example with a Continuous Prior on
Θ
Assume the data X is drawn from a Gaussian with a known variance σ2 and an unknown mean μ (this is now the Θ).
Assume a Gaussian prior on Θ i.e. Θ∼N(μ0,σ02) and μ0,σ02 are known.
Then, generate X from N(Θ,σ2) (this Θ is the mean of the Gaussian from which X was chosen. It is what we need to estimate.)
Given X, we have:
ΘML=m(i.e.thesamplemean)=N∑txt
ΘMAP=N/σ2+1/σ02N/σ2m+N/σ2+1/σ021/σ02μ0
ΘBAYES′=ΘMAP!
As N→∞, m dominates the weighted sum of m and μ0.
Estimates of Mean and Variance of a Distribution (not just Gaussian)
The ML estimate for the mean is m i.e. the sample mean.
The ML estimate of variance is N∑t(xt−m)2 (this is biased since E[σ2]<σ2).
Note that the estimate for variance is lower than the actual value because we use the sample mean m to compute it instead of using the actual mean.
However, N−1∑t(xt−m)2 is an unbiased estimate.