x=x1...xd
We need to find the parameters W=w0...wd
so that the linear function g(x∣wd,wd−1,...,w1,w0)=wdxd+wd−1xd−1+...+w1x1+w0
minimizes the square error on the dataset {xt,rt}t=1N where xt=x1tx2t...xdt
Let D=11...1x11x12...x1Nx21x22...x2N..................xd1xd2...xdNN×(d+1) and r=r1r2...rNN×1
Then, W=(DTD)−1DTr
Sometimes, the inverse doesn't exist. This usually happens when the number of dimensions is too less.