QR Decomposition: Let \(\mathbf{X}\) be an \(n\times p\) matrix of rank \(p(n\ge p)\). Then, \(\mathbf{X}\) can be written as \(\mathbf{X} = \mathbf{Q}_{n\times p}\mathbf{R}_{p\times p}\), where \(\mathbf{Q}^T\mathbf{Q}=\mathbf{I}\) and \(\mathbf{R}\) is an upper-triangular matrix.(Harville 1997)
In least square problem, \(\mathbf{y} = \boldsymbol{X\beta} + \mathbf{e}\), where \(\mathbf{e}\sim \text{MVN}(\mathbf{0},\mathbf{I}\ \times \sigma^2)\).
It is well known that \(\hat{\boldsymbol{\beta}}=(\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\mathbf{y}\). Use the fact \(\mathbf{X} = \mathbf{Q}\mathbf{R}\) and \(\mathbf{Q}^T\mathbf{Q}=\mathbf{I}\)
\[\begin{equation} \begin{split} \hat{\boldsymbol{\beta}}&= (\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\mathbf{y}\\ &=(\boldsymbol{R}^T\boldsymbol{Q}^T\boldsymbol{Q}\boldsymbol{R})^{-1}\boldsymbol{R}^T\boldsymbol{Q}^T\mathbf{y}\\ &=(\boldsymbol{R}^T\boldsymbol{R})^{-1}\boldsymbol{R}^T\boldsymbol{Q}^T\mathbf{y}\\ &=\boldsymbol{R}^{-1}\boldsymbol{Q}^T\mathbf{y} \end{split} \end{equation}\]
I will demonstrate application of QR decomposition in Least square regression.
import numpy as np
import statsmodels.api as sm
import timeit
import matplotlib.pyplot as plt
np.random.seed(511)
y = np.concatenate([np.random.normal(loc=20,scale=1,size=100),
np.random.normal(loc=40,scale=1,size=100),
np.random.normal(loc=60,scale=1,size=100)])
X1 = np.ones(300)
X2 = np.concatenate([np.ones(100),np.zeros(200)])
X3 = np.concatenate([np.zeros(100),np.ones(100),np.zeros(100)])
X = np.column_stack((X1,X2,X3))
beta_est = np.linalg.inv(X.T @ X) @ X.T @ y
beta_est
## array([ 60.03807704, -40.05588504, -20.08218135])
Q,R = np.linalg.qr(X)
beta_est = np.linalg.inv(R) @ Q.T @ y
beta_est
## array([ 60.03807704, -40.05588504, -20.08218135])
statsmodels
model = sm.OLS(y,X)
model.fit().summary()
Dep. Variable: | y | R-squared: | 0.996 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.996 |
Method: | Least Squares | F-statistic: | 3.785e+04 |
Date: | Sat, 25 Feb 2023 | Prob (F-statistic): | 0.00 |
Time: | 22:18:45 | Log-Likelihood: | -432.90 |
No. Observations: | 300 | AIC: | 871.8 |
Df Residuals: | 297 | BIC: | 882.9 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 60.0381 | 0.103 | 583.176 | 0.000 | 59.835 | 60.241 |
x1 | -40.0559 | 0.146 | -275.121 | 0.000 | -40.342 | -39.769 |
x2 | -20.0822 | 0.146 | -137.933 | 0.000 | -20.369 | -19.796 |
Omnibus: | 1.175 | Durbin-Watson: | 2.002 |
---|---|---|---|
Prob(Omnibus): | 0.556 | Jarque-Bera (JB): | 0.906 |
Skew: | 0.099 | Prob(JB): | 0.636 |
Kurtosis: | 3.183 | Cond. No. | 3.73 |