Best Approximation Theorem – Passion is like genius; a miracle.

When we want to find $X$ of $AX=B$ and if there is no exact answer X, then the least square approximation of $X$ is the projection of $B$ onto in column vector space $col(A)$ .

That’s because, if column vectors in $A$ are linearly independent, $col(A)$ consists of a vector space. If there is no exact answer $X$ , then $AX \neq B$ . Thus, $B$ exists outside the vector space $col(A)$ . The best approximation of least square distance between $AX$ and $B$ is when $AX^*$ is projection of $B$ onto $col(A)$ where $X^*$ is the best approximation.

Thus, $(B-AX^*) \bot AX = 0$ and $(B-AX^*)^TAX=0$ and as it holds for every $X$ , $X^*=(A^T A)^{-1}A^TB$ .

See:
http://www.minho-kim.com/courses/10sp71007/data/p07-handout.pdf
http://people.ucsc.edu/~lewis/Math140/Ortho_projections.pdf