Linear Algebra

A vector $v \in \mathbb{R}^n$ is a tuple of $n$ real numbers.
A matrix $A \in \mathbb{R}^{m \times n}$ is a collection of $m \times n$ real numbers.

Operations

Vector
- $+, -, \odot, (\alpha \cdot)$ are element wise operations.
- Norm: $||v|| = \sqrt{\sum_{i=1}^n v_i^2}$
- Inner product: $v \cdot w = \sum_{i=1}^n v_i \cdot w_i$ = $||v|| \cdot ||w|| \cdot \cos(\theta)$
- Cross product:
$v \times w = (v_1, v_2, v_3) \times (w_1, w_2, w_3) \\ = (v_2 \cdot w_3 - v_3 \cdot w_2, v_3 \cdot w_1 - v_1 \cdot w_3, v_1 \cdot w_2 - v_2 \cdot w_1) \\ ||v \times w|| = ||v|| \cdot ||w|| \cdot \sin(\theta)$
- Cross product is non-commutative.
Matrix
- $+, -, \odot, (\alpha \cdot)$ are element wise operations.
- Matrix multiplication: $A \cdot B$ : Look it up
- Matrix transpose: $A^T: Interchange rows and columns.
- Trace of a matrix: $tr(A) = \sum_{i=1}^n a_{ii}$
- Determinant of a $3x3$ matrix: $det(A)$ : $\det(\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}) = a (e * i - f * h) - b (d * i - f * g) + c (d * h - e * g)$
- Minor of a matrix $M_{i, j}$ : The minor of matrix is for each element of matrix and is equal to the determinant of part of the matrix remaining after excluding the row and the column containing that particular element.
- Cofactor of a minor $C_{i, j} = (-1)^{i+j} M_{i, j}$
- Adjoint of a matrix $Adj(A) = [C_{i, j}]^T$ where $C_{i, j}$ is cofactor of $A_{i, j}$
- Inverse of a matrix $A^{-1} = \frac{1}{det(A)} Adj(A)$ $A^{- 1} = \frac{1}{d e t ( A )} A d j (A)$
  - Inverse doesnt exist when $det(A) = 0 \implies \text{Rank of matrix is not full, One of the row/column is a linear of combination of the other row/column respectively.}$
Matrix-Vector
- $y = A \cdot x$ : Multiplication of matrix $A$ by vector $x$ produces a linear combination of columns of $A$ . This introduces the concept of Linear Transformation

TODO: Add properties of different types of matrices (diagonal, symmetric, etc) TODO: Add notes on solving linear equations

Linear Transformation

Applying a matrix $A$ to a vector $x$ can be considered as a linear transformation of vector. A linear transformation is a mapping which preserves the following properties:

$A (x + y) = A x + A y$
$A (\alpha x) = \alpha A x$

TODO: Add notes about translation, rotation, scaling, reflection?

Eigenvalues and Eigenvectors

For a square matrix $A$ , the eigenvalues $\lambda$ and eigenvectors $v$ of $A$ are such that $A v = \lambda v$ . In other words, transforming $v$ by $A$ is equivalent to scaling $v$ by $\lambda$ - the eigenvalue, so the direction of $v$ is preserved.

Sum of eigenvalues is the trace of the matrix $tr(A)$ .
Product of eigenvalues is the determinant of the matrix $det(A)$ .
If the matrix is invertible, the eigenvalues are the roots of the characteristic polynomial $det(A - \lambda I) = 0$ .
Eigen values of $A^{-1}$ are the inverse of eigenvalues of $A$ , but eigen vectors are same.
Eigen values of $A^T$ are the same as eigenvalues of $A$ .
If the matrix is singular, $\det(A) = 0$ , atleast one eigenvalue is 0.
For a symmetric matrix $A$ , the eigenvalues are real and the eigenvectors are orthogonal.

Diagonalization

A square matrix $A$ is said to be diagonalizable if there exists a matrix $P$ such that $A = PDP^{-1}$ where $D$ is a diagonal matrix.

Consider the a matrix $P = \begin{bmatrix} v_1 & v_2 & \ldots \end{bmatrix}$ , where $v_1, v_2, \ldots$ are linearly independent eigenvectors of $A$ (If eigen values are unique, the corresponding eigenvectors are linearly independent). Then by definition,

A \cdot P = A \cdot \begin{bmatrix} v_1 & v_2 & \ldots \end{bmatrix} \\ = \begin{bmatrix} \lambda_1 v_1 & \lambda_2 v_2 & \ldots \end{bmatrix} \\ = \begin{bmatrix} v_1 & v_2 & \ldots \end{bmatrix} \cdot \begin{bmatrix} \lambda_1 & 0 & \ldots \\ 0 & \lambda_2 & \ldots \\ \vdots & \vdots & \ddots \end{bmatrix} \\ = P \cdot D \\ \text{Taking } P^{-1} \text{ on both sides, } \\ A \cdot P \cdot P^{-1}= P \cdot D \cdot P^{-1}\\ A = P \cdot D \cdot P^{-1}

Therefore, $A$ is diagonalizable using linearly independent eigenvectors and diagonal matrix of eigen values. If $A$ is not diagonalizable, its called Defective Matrix. One reason is if eigen values are not unique the eigen vectors might not be linearly independent. (TODO: More information on this is confusing)

Inverse using eigen decomposition

If $A$ can be eigen-decomposed, and none of eigen values are zero, the $A$ is invertible and $A^{-1} = P \cdot D^{-1} \cdot P^{-1}$

Singular Value Decomposition

A generalisation of eigen decomposition to an $m \times n$ matrix $A$ .

SVD is factorization of a real or complex matrix into a rotation matrix, followed by scaling matrix and then another rotation matrix. For a matrix $M \in \mathbb{R}^{m \times n}$ , SVD is given by $M = U \Sigma V^T$ where $U \in \mathbb{R}^{m \times m}, \Sigma \in \mathbb{R}^{m \times n}, V \in \mathbb{R}^{n \times n}$ . Here $U$ and $V$ are complex unitary (complex version of orthogonal) matrices and $\Sigma$ is a diagonal matrix of non-negative real numbers.

The entries of $\Sigma$ are called singular values of $M$ .
The columns of $U$ are called left singular vectors of $M$ .
The columns of $V$ are called right singular vectors of $M$ .
The singular values of $M$ are (usually) sorted in non-decreasing order.
SVD is not unique.

Understanding SVD through Eigen decomposition

For a matrix $A$ , we want to find $U, V, \Sigma$ such that $A = U \Sigma V^T$ .

Consider $AA^T$ , which is a symmetric and positive semi-definite matrix. We can write the eigen decomposition as $A^TA = V \Lambda V^T$ , where $V$ is an orthogonal matrix and $\Lambda$ is a diagonal matrix of eigen values of $A^TA$ . And the eigen values $\lambda_i$ are non-negative (or positive?). And thus, for any (unit) eigen vector $v_i$ , we have $A^TAv_i = \lambda_i v_i$ .

Lets consider $\sigma_i = \sqrt{\lambda_i}$ to be the diagonal elements of $\Sigma$ . These are called singular values of $A$ . Next lets define $u_i = \frac{Av_i}{\sigma_i}$ , because $Av_i$ transforms unit orthogonal vectors to another set of orthogonal vectors. But these have to be normalised.

A^TA v_i = \lambda_i v_i \\ A^TAv_i = \sigma_i^2 v_i \\

||A v_i|| = (Av_i)^T Av_i\\ ||A v_i|| = v_i^T A^T Av_i\\ ||A v_i|| = v_i^T (\sigma_i^2 v_i)\\ ||A v_i|| = \sigma_i^2 ||v_i||\\ ||A v_i|| = \sigma_i^2 \\

So we can take the another set of orthogonal unit vectors $u_i = \frac{Av_i}{\sigma_i}$

Thus, $Av_i = \sigma_i u_i$ and considering for the entire matrix, $AV = U \Sigma$ . Taking the inverse of $V$ on both sides. We get $A = U \Sigma V^{-1} = U \Sigma V^T$

TODO: The intuition behind coming up with $u = \frac{Av_i}{\sigma_i}$ has to be checked.