so that the linear function g(x∣wd,wd−1,...,w1,w0)=wdxd+wd−1xd−1+...+w1x1+w0
minimizes the square error on the dataset {xt,rt}t=1N where xt=x1tx2t...xdt
Let D=11...1x11x12...x1Nx21x22...x2N..................xd1xd2...xdNN×(d+1) and r=r1r2...rNN×1