When we want to find of and if there is no exact answer X, then the least square approximation of is the projection of onto in column vector space .
That’s because, if column vectors in are linearly independent, consists of a vector space. If there is no exact answer , then . Thus, exists outside the vector space . The best approximation of least square distance between and is when is projection of onto where is the best approximation.
Thus, and and as it holds for every , .
See:
http://www.minho-kim.com/courses/10sp71007/data/p07-handout.pdf
http://people.ucsc.edu/~lewis/Math140/Ortho_projections.pdf