Mean Normalization
In addition to scaling the features, some may also consider Mean Normalization.
In this, we replace xi by xi−μi so as to make the features have approximately 0 mean.
(Note: This is not applied for x0 which has a fixed value 1).
In general, we can use the following formula to scale the features using mean normalization:
xi=(xi−μi)/Si
where xi is the ith feature, μi is its mean and Si is its range (i.e. max-min).
If this leads to xi being in the range [-0.5, 0.5] approximately, the gradient descent will work quickly.
Last updated