The idea is to start with a higher learning rate and decrease it as time progresses.
Δwit=−η∂Et∂wi+αΔwit−1\Delta w_i^t = -\eta \frac{\partial E^t}{\partial w_i} + \alpha \Delta w_i^{t-1}Δwit=−η∂wi∂Et+αΔwit−1
(t is time). α\alphaα can be default or computed.
Too much training could cause overfitting. Stop training when the validation error starts to increase.
Last updated 5 years ago