1 Introduction

It has been four years since I learned generalized linear model at first time. GLM was a very difficult topic to me at that time. I were so confused by terminologies such as link function, canonical link, mean and variance relationship and asymptotic normality of MLE. It is worth the effort to write down proof/explanation of some important results about Generalized Linear Models(GLM).

2 Definition of Exponential Dispersion Family

In framework of GLM, random variable \(y\) is assumed to follow this exponential dispersion family, defined as:

\[\begin{equation} f(y;\theta,\phi)=\exp \left\{\frac{y \theta-b(\theta)}{a(\phi)}+c(y, \phi)\right\} \tag{2.1} \end{equation}\]

3 Mean and Variance Relationship

Log-likelihood of one observation is

\[\begin{equation} \ell(\theta;y,\phi)= \log(f(y ; \theta, \phi))=\frac{y \theta-b(\theta)}{a(\phi)}+c(y, \phi) \tag{3.1} \end{equation}\]

Under regularity condition, expectation of score statistic, \(s(\theta) = \frac{\partial \ell(\theta;y,\phi)}{\partial \theta}\), is 0.

\[\begin{equation} \begin{aligned} E(\frac{\partial \ell(\theta ; y, \phi)}{\partial \theta}) &= E(\frac{1}{f(y ; \theta, \phi)} \frac{\partial f(y ; \theta, \phi)}{\partial \theta})\\ &= \int_{x} \frac{1}{f(y ; \theta, \phi)} \frac{\partial f(y ; \theta, \phi)}{\partial \theta} f(y ; \theta, \phi) dx \\ &= \int_{x} \frac{\partial f(y ; \theta, \phi)}{\partial \theta} dx\\ &= \frac{\partial \int_{x} f(y ; \theta, \phi) dx }{\partial \theta}\\ &= \frac{\partial 1 }{\partial \theta} \\ &= 0 \end{aligned} \tag{3.2} \end{equation}\]

Under regularity condition, expected information, \(\mathcal{I}(\theta)=\mathrm{E}\left[\left(\frac{\partial}{\partial \theta}\ell(\theta ; y, \phi)\right)^{2} \right]\), can be expressed as \(-\mathrm{E}\left[\frac{\partial^{2}}{\partial \theta^{2}} \ell(\theta ; y, \phi)\right]\).

\[\begin{equation} \mathcal{I}(\theta)=\mathrm{E}\left[\left(\frac{\partial}{\partial \theta} \ell(\theta ; y, \phi)\right)^{2}\right] = -\mathrm{E}\left[\frac{\partial^{2}}{\partial \theta^{2}} \ell(\theta ; y, \phi)\right] \tag{3.3} \end{equation}\]

Because \[\begin{equation} \frac{\partial^{2}}{\partial \theta^{2}} \ell(\theta ; y, \phi) = \frac{-1}{f^2(y;\theta\phi)} \left(\frac{\partial f(y ; \theta, \phi)}{\partial, \theta}\right)^2 + \frac{1}{f(y ; \theta, \phi)}\frac{\partial f^2(y ; \theta, \phi)}{\partial \theta^2} \tag{3.4} \end{equation}\]

And \[\begin{equation} E(\frac{1}{f(y ; \theta, \phi)} \frac{\partial^2 f(y ; \theta, \phi)}{\partial \theta^{2}}) = \int_{x} \frac{\partial^2 f(y ; \theta, \phi)}{\partial \theta^{2}} dx = \frac{\partial^2 \int_{x} f(y ; \theta, \phi) d x}{\partial \theta^2} = 0 \tag{3.5} \end{equation}\]

Put (3.4) and (3.5) together, we have (3.3).

Now we have

\[E\left(\frac{\partial \ell(\theta ; y, \phi)}{\partial \theta}\right)= 0 \quad \text{and} \quad \mathrm{E}\left[\left(\frac{\partial}{\partial \theta} \ell(\theta ; y, \phi)\right)^{2}\right]=-\mathrm{E}\left[\frac{\partial^{2}}{\partial \theta^{2}} \ell(\theta ; y, \phi)\right]\] \[\begin{equation} E\left(\frac{\partial \ell(\theta ; y, \phi)}{\partial \theta}\right)=0 \quad \Rightarrow \quad E\left(\frac{y -b^{'}(\theta)}{a(\phi)}\right) = 0 \quad \Rightarrow \quad E(y) = b^{'}(\theta) \tag{3.6} \end{equation}\]

\[\begin{equation} \mathrm{E}\left[\left(\frac{\partial}{\partial \theta} \ell(\theta ; y, \phi)\right)^{2}\right] = \mathrm{E}\left[\left(\frac{y-b^{\prime}(\theta)}{a(\phi)}\right)^2 \right] = \frac{\text{var}(y)}{(a(\phi))^2} \tag{3.7} \end{equation}\]

\[\begin{equation} \mathrm{E}\left[\left(\frac{\partial}{\partial \theta} \ell(\theta ; y, \phi)\right)^{2}\right] = -\mathrm{E}\left[\frac{\partial^{2}}{\partial \theta^{2}} \ell(\theta ; y, \phi)\right] = E\left[ \frac{b^{''}(\theta)}{a(\phi)}\right] = \frac{b^{\prime \prime}(\theta)}{a(\phi)} \tag{3.8} \end{equation}\]

Combine (3.7) and (3.8),

\[\begin{equation} \text{var}(y) = b^{\prime \prime}(\theta) a(\phi) \tag{3.9} \end{equation}\]

Overall, in GLM framework, mean and variance can be expressed by \(b(\theta)\) and \(a(\phi)\),

\[ E(y)=b^{\prime}(\theta) \quad \text{and} \quad \operatorname{var}(y)=a(\phi) b^{\prime \prime}(\theta) \]

5 Likelihood equation for GLM

\[\begin{equation} \frac{\partial \ell_i(\theta_i)}{\partial \theta_i} = \frac{y_i - b^{'}(\theta_i)}{a(\phi)} \tag{5.1} \end{equation}\]

From (3.6), we have \(E(y_i)=\mu_i=b^{\prime}(\theta_i)\) and \(\frac{\partial \mu_i}{\partial \theta_i} = b^{''}(\theta_i)\)

\[\begin{equation} \frac{\partial \theta_i}{\partial \mu_i} = \frac{1}{\frac{\partial \mu_i}{\partial \theta_i}} = \frac{1}{b^{''}(\theta_i)} \tag{5.2} \end{equation}\]

From (4.1), we have

\[\begin{equation} \frac{\partial \mu_i}{\partial \eta_i} = \frac{1}{g^{'}(\mu_i)} \tag{5.3} \end{equation}\]

Last, from \(\eta_i = \sum_{j}x_{i,j}\beta_j\)

\[\begin{equation} \frac{\partial \eta_i}{\partial \beta_j} = x_{i,j} \tag{5.4} \end{equation}\]

Hence,

\[\begin{equation} \frac{\partial \ell_i\left(\theta_{i}\right)}{\partial \beta_{j}} = \frac{\partial \ell_i\left(\theta_{i}\right)}{\partial \theta_{i}} \frac{\partial \theta_{i}}{\partial \mu_{i}} \frac{\partial \mu_{i}}{\partial \eta_{i}} \frac{\partial \eta_{i}}{\partial \beta_{j}} = \frac{(y_i-b^{\prime}\left(\theta_{i}\right))x_{i,j}}{a(\phi)b^{\prime \prime}\left(\theta_{i}\right)} \frac{1}{g^{\prime}\left(\mu_{i}\right)} = \frac{(y_i-\mu_i)x_{i,j}}{\text{var}(y_i)} \frac{1}{g^{\prime}\left(\mu_{i}\right)} \tag{5.5} \end{equation}\]

Overall, likelihood equation for GLM is

\[\frac{\partial \ell_i\left(\theta_{i}\right)}{\partial \beta_{j}} = \frac{\left(y_{i}-\mu_{i}\right) x_{i, j}}{\operatorname{var}\left(y_{i}\right)} \frac{1}{g^{\prime}\left(\mu_{i}\right)} \quad \text{or equivalent,} \quad \frac{\left(y_{i}-\mu_{i}\right) x_{i, j}}{\operatorname{var}\left(y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}}\]

6 Large-Sample property of MLE estimation in GLM

By property of MLE estimation, we have

\[\begin{equation} \hat{\boldsymbol{\beta}} \stackrel{d}{\rightarrow} \mathcal{N}\left(0, \mathcal{I_n}^{-1}\right) \tag{6.1} \end{equation}\]

where \(\mathcal{I_n}\) is expected information, fisher information matrix for \(n\) observations \(y_1,...,y_n\).

For each observation \(y_i\), fisher information matrix is defined as

\[\begin{equation} \begin{aligned} \mathcal{I}_{i,j, k} &\equiv \mathrm{E}\left[\left(\frac{\partial}{\partial \beta_{j}} \log f(y_i ; \boldsymbol{\beta})\right)\left(\frac{\partial}{\partial \beta_{k}} \log f(y_i ; \boldsymbol{\beta})\right)\right] \\ &= \mathrm{E}\left[ \frac{\left(y_{i}-\mu_{i}\right) x_{i, j}}{\operatorname{var}\left(y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}} \frac{\left(y_{i}-\mu_{i}\right) x_{i, k}}{\operatorname{var}\left(y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}} \right] \\ &= \frac{x_{i, j} x_{i, k}}{\operatorname{var}\left(y_{i}\right)} \left(\frac{\partial \mu_{i}}{\partial \eta_{i}}\right)^2 \end{aligned} \tag{6.2} \end{equation}\]

Since \(y_i \perp y_j,\forall i \ne j\), complete log-likelihood of exponential dispersion family is \[\begin{equation} \ell(\boldsymbol{\theta} ; \boldsymbol{y}, \phi) = \sum_i \ell(\boldsymbol{\theta} ; y_i, \phi) \tag{6.3} \end{equation}\]

The \(j\)th and \(k\)th element of Fisher information matrix, \(\mathcal{I}_{n}\), is

\[\begin{equation} \mathcal{I}_{., j, k} = \sum_{i=1}^{n} \mathcal{I}_{i, j, k} \tag{6.4} \end{equation}\]

Overall, fisher information matrix can be expressed as

\[\begin{equation} \mathcal{I}_{n} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}, \quad \text{where } \boldsymbol{W} \text{ is a diagnal matrix with element } w_i = \frac{\left(\frac{\partial \mu_{i}}{\partial \eta_{i}}\right)^{2}}{\operatorname{var}\left(y_{i}\right)} \tag{6.5} \end{equation}\]

References

Agresti, Alan. 2016. Foundations of Linear and Generalized Linear Models. First Edition.