# Mean Normalization

In addition to scaling the features, some may also consider Mean Normalization.

In this, we replace $$x\_i$$ by $$x\_i-μ\_i$$ so as to make the features have approximately 0 mean.

(**Note**: This is not applied for $$x\_0$$ which has a fixed value 1).

In general, we can use the following formula to scale the features using mean normalization:

$$x\_i = (x\_i-μ\_i)/S\_i$$

where $$x\_i$$ is the $$i^{th}$$ feature, $$μ\_i$$ is its mean and $$S\_i$$ is its range (i.e. max-min).

If this leads to $$x\_i$$ being in the range \[-0.5, 0.5] approximately, the gradient descent will work quickly.
