CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page
  • Estimating the Parameter of a Bernoulli Distribution
  • Estimating the Parameters of a Multinomial Distribution
  • Estimating the Parameters of a Gaussian Distribution

Was this helpful?

  1. Types of Machine Learning
  2. Supervised Learning

Parametric Estimation

This section discusses how to estimate the parameters of a distribution i.e. μ,σ2\mu, \sigma^2μ,σ2 of the line f(x)=w0+w1xf(x) = w_0 + w_1xf(x)=w0​+w1​x.

We denote the parameters by Θ=(μ,σ2)\Theta = (\mu, \sigma^2)Θ=(μ,σ2)

The likelihood of Θ\ThetaΘ given a sample X is given by:

l(Θ∣X)=∑t p(xt∣Θ)l(\Theta|X) = \sum_t \,p(x^t|\Theta)l(Θ∣X)=∑t​p(xt∣Θ)

Therefore, the Log Likelihood of Θ\ThetaΘ given a sample X is denoted by:

L(Θ∣X):=log l(Θ∣X)=∑tlogp(xt∣Θ)L(\Theta|X) := log \,l(\Theta|X) = \sum_t log p(x^t|\Theta)L(Θ∣X):=logl(Θ∣X)=∑t​logp(xt∣Θ)

This assumes that the observations in X are independent.

The Maximum Likelihood Estimator (MLE) is given by:

Θ∗:=argmaxΘL(Θ∣X)\Theta^* := argmax_\Theta L(\Theta|X)Θ∗:=argmaxΘ​L(Θ∣X)

Estimating the Parameter PhP_hPh​ of a Bernoulli Distribution

X is a Bernoulli Random Variable.

Ph=P[X=1]P_h = P[X=1]Ph​=P[X=1]

For example, consider the following:

Let 1 denote Heads, and 0 denote Tails. Say X = {1,1,0} We need to determine Θ\ThetaΘ i.e. PhP_hPh​.

We have l(Ph∣X)=P(X∣Ph)=Ph∗Ph∗(1−Ph)l(P_h|X) = P(X|P_h) = P_h*P_h*(1-P_h)l(Ph​∣X)=P(X∣Ph​)=Ph​∗Ph​∗(1−Ph​)

More generally, for X={xt}t=1NX = \{x^t\}_{t=1}^NX={xt}t=1N​, we have:

p(X∣ph)=Πt=1Nphxt(1−ph)(1−xt)p(X|p_h) = \Pi_{t=1}^N p_h^{x^t}(1-p_h)^{(1-x^t)}p(X∣ph​)=Πt=1N​phxt​(1−ph​)(1−xt)

It can be proved that the MLE is given by ph=∑xtNp_h = \frac{\sum x^t}{N}ph​=N∑xt​.

Estimating the Parameters of a Multinomial Distribution

Consider a die with 6 faces numbered from 1 to 6.

If X is a Multinomial Random Variable, there are k>2 possible values of X (here, 6).

Say X={5, 4, 6}. We can imagine indicator vectors for each observation as [0 0 0 0 1 0], [0 0 0 1 0 0] and [0 0 0 0 0 1].

Say X={4,6,4,2,3,3}. The MLE of xix_ixi​ i.e. side i shows up, can be given by:

pi=∑t=1NxitNp_i = \frac{\sum_{t=1}^N x_i^t}{N}pi​=N∑t=1N​xit​​.

Estimating the Parameters of a Gaussian Distribution

The MLE for the mean m is ∑txtN\frac{\sum_t x^t}{N}N∑t​xt​ and the MLE for the variance σ2\sigma^2σ2 is ∑t (xt−m)2N\frac{\sum_t\,(x^t-m)^2}{N}N∑t​(xt−m)2​.

However, if we divide by N-1 instead of N (for variance), it is called the unbiased estimate.

PreviousEvaluating a ClassifierNextBayesian Approach to Parameter Estimation

Last updated 5 years ago

Was this helpful?