Linear Algebra

20 May 2019

Elementary matrix


Linearly dependent

Linearly independent

Ordinary least squares

Ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model.

\[X\boldsymbol{\beta} = \boldsymbol{y}\]

Usually this equation is overdetermined so there is no solution $\beta$.

Instead, we want to find ${\hat \beta}$ such that

\[{\hat {\boldsymbol{\beta}}} = \operatorname{arg\,min}_{\boldsymbol{\beta}} {\lVert X\boldsymbol{\beta} - \boldsymbol{y}\rVert^2}\]

To find the critical point we calculate the gradient value of the squared error term $S = \lVert X\boldsymbol{\beta} - \boldsymbol{y}\rVert^2$.

\[\begin{aligned} \nabla_{\boldsymbol{\beta}}S & = \nabla_{\boldsymbol{\beta}} \lVert X\boldsymbol{\beta} - \boldsymbol{y}\rVert^2 \\ & = \nabla_{\boldsymbol{\beta}} (X\boldsymbol{\beta} - \boldsymbol{y})^{\mathsf {T}}(X\boldsymbol{\beta} - \boldsymbol{y}) \\ & =\nabla_{\boldsymbol{\beta}} (\boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}} - \boldsymbol{y}^{\mathsf {T}})(X\boldsymbol{\beta} - \boldsymbol{y})\\ & =\nabla_{\boldsymbol{\beta}} (\boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}X\boldsymbol{\beta} - \boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}\boldsymbol{y} - \boldsymbol{y}^{\mathsf {T}}X\boldsymbol{\beta} + \boldsymbol{y}^{\mathsf {T}}\boldsymbol{y})\\ & =\nabla_{\boldsymbol{\beta}} (\boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}X\boldsymbol{\beta} - \boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}\boldsymbol{y} - (\boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}\boldsymbol{y})^{\mathsf {T}} + \boldsymbol{y}^{\mathsf {T}}\boldsymbol{y})\\ & =\nabla_{\boldsymbol{\beta}} (\boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}X\boldsymbol{\beta} - 2 \boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}\boldsymbol{y} + \boldsymbol{y}^{\mathsf {T}}\boldsymbol{y})\\ & =\nabla_{\boldsymbol{\beta}} (\boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}X\boldsymbol{\beta} - 2 \boldsymbol{\beta}^{\mathsf {T}}X^{\mathsf {T}}\boldsymbol{y})\\ & =2 X^{\mathsf {T}}X\boldsymbol{\beta} - 2 X^{\mathsf {T}}\boldsymbol{y} \end{aligned}\]

Set $\nabla_{\boldsymbol{\beta}}S = 0$ for the critical point.

\[\begin{aligned} \nabla_{\boldsymbol{\beta}}S = 2 X^{\mathsf {T}}X\boldsymbol{\beta} - 2 X^{\mathsf {T}}\boldsymbol{y} = 0 \\ X^{\mathsf {T}}X\boldsymbol{\beta} = X^{\mathsf {T}}\boldsymbol{y} \\ \boldsymbol{\beta} = (X^{\mathsf {T}}X)^{-1} X^{\mathsf {T}}\boldsymbol{y} \\ \end{aligned}\]


\({\hat {\boldsymbol{\beta}}} =(X^{\mathsf {T}}X)^{-1} X^{\mathsf {T}}{\boldsymbol{y}} \\\) .


orthogonal set

orthonormal set

standard basis

set of unit vectors to the axis direction in Euclidean space. For example, (0, 1), (1,0).


A determinant expresses the signed n-dimensional volume of n-dimensional parallelepiped.

diagonal matrix

symmetric matrix

skew-symmetric matrix (anti-symmetric matrix)

\[A^{\mathsf {T}} = -A\]

orthogonal matrix

\[Q^{\mathsf {T}} Q = QQ^{\mathsf {T}} = I\]

conjugate transpose

\[A^\ast={\overline {A^{\mathsf {T}}}}\]

Hermitian matrix


unitary matrix

Column vectors form an orthonormal set in $C^n$.

normal matrix

$A^\ast A = AA^\ast = I$

positive definite

\[x^{\mathsf {T}}Ax \gt 0\]

positive semi-definite

\[x^{\mathsf {T}}Ax \ge 0\]

QR Factorization

\[A = QR\]

eigenvalues and eigenvectors

\[Ax = \lambda x\]


\[A = Q \Lambda Q^{-1}\]

(special cases)

for real symmetric matrices

\(A = Q \Lambda Q^{\mathsf {T}}\) .


SVD is similar to finding orthogonal matrix $V$ which still can be represented as a product of another orthogonal matrix $U$ and a diagonal matrix $\Sigma$ when $V$ is transformed by $M$.

reduced SVD

truncated SVD

Picked $k$ singular values.


Gramian matrix


Please create a GitHub issue for discussion.🙂