# Probabilistic Clustering

Clusters form because of one simple reason: the points are being drawn from different distributions. Points that come from the same distribution tend to get clustered together.

Say we have data for men's heights and women's heights. The data points will clearly be from two Gaussian distributions.

Sometimes, we do not know how many distributions the data points were drawn from. So, we could experiment with different k values.

Say we have two distributions with means $$\mu\_1,\mu\_2$$ and standard deviations $$\sigma\_1,\sigma\_2$$.\
Then, $$p(X|\mu\_1,\mu\_2,\sigma\_1,\sigma\_2)=\Pi\_{x\in X}\[\frac{1}{2}(\frac{1}{\sqrt{2\pi}\sigma\_1}e^{-\frac{(x-\mu\_1)^2}{2\sigma\_1^2}}) + \frac{1}{2}(\frac{1}{\sqrt{2\pi}\sigma\_2}e^{-\frac{(x-\mu\_2)^2}{2\sigma\_2^2}})]$$

We need to estimate the parameters $$\mu\_1,\mu\_2$$ for the Mixture of the Gaussians (i.e. a **Gaussian Mixture Model** (GMM)), assuming we have a known $$\sigma ,,(\sigma\_1=\sigma\_2=\sigma)$$ and a uniform prior.

$$argmax\_{\mu\_1,\mu\_2},, p(X|\mu\_1,\mu\_2)$$

To do so, we use an approach called **Expectation Maximization** (EM)

## EM in the Context of GMMs

Here, we discuss EM in the context of a Mixture of 2 Gaussians, with the assumptions that we have a known $$\sigma ,,(\sigma\_1=\sigma\_2=\sigma)$$ and a uniform prior.

Assume each data point is actually a triple of the form $$(x,z\_1,z\_2)$$, where x is given but $$z\_1, z\_2$$ are unobservable.\
$$z\_1=1$$ if the point was generated from $$G\_1$$(Gaussian 1), and 0 otherwise. Similarly for $$z\_2$$.

### Algorithm

1. Pick random initial values for $$h=(\mu\_1,\mu\_2)$$
2. **E Step**:

   for j=1 to 2:

   calculate the expected value $$E\[z\_j]$$ for each x, assuming that the current hypothesis is correct

   $$E\[z\_1]=1.p(x,, was,, generated,, from,, dist.,, 1)+0.p(x,, was,, generated,, from,, dist.,, 2)$$

   $$= p(x,, was,, generated,, from,, dist.,, 1) = P(G\_1|x)$$

   in general, $$E\[z\_j]= P(G\_j|x)=\frac{p(x|G\_j)P(G\_j)}{p(x)}$$

   here, $$E\[z\_1]=\frac{p(x|G\_1).\frac{1}{2}}{p(x|G\_1).\frac{1}{2}+p(x|G\_2).\frac{1}{2}}$$ (since we assume uniform prior, $$P(G\_1)=P(G\_2)=\frac{1}{2}$$) = $$\frac{p(x|G\_1)}{p(x|G\_1)+p(x|G\_2)}$$

   and $$E\[z\_2]= \frac{p(x|G\_2)}{p(x|G\_1)+p(x|G\_2)}$$. So, $$E\[z\_1]+E\[z\_2]=1$$.
3. **M Step**: for j= 1 to 2: update the means, using a weighted average: $$\mu\_j = \frac{\sum\_{x\in X} x.P(G\_j|x)}{\sum\_x P(G\_j|x)}$$
4. Repeat 2 and 3 until convergence.

EM may not converge to a global optimum, but will at least converge to a local optimum, in a reasonable amount of time.

**Note**: This is soft-clustering. If we wanted to perform hard clustering, we would make $$z\_1$$ or $$z\_2$$ equal to 0 or 1, depending on whichever was bigger than the other, for each example in each iteration.

## Why Use K Means instead of EM?

* It does not make any underlying assumptions about the distributions from which the data was drawn. So, if we aren't convinced that our data comes from Gaussians, we may want to steer clear from EM.
* Also, EM gives a soft clustering by default, while KNN gives a hard clustering.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://vikram-bajaj.gitbook.io/cs-gy-6923-machine-learning/main-4/types-of-machine-learning/unsupervised-learning/probabilistic-clustering.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.