Define log likelihood as
\[\begin{equation} l(\boldsymbol{\theta}) = \sum_{i}^{n}\text{log}f(y_i;\boldsymbol{\theta}) \tag{1.1} \end{equation}\]
Score is gradient of log likelihood \(l(\boldsymbol{\theta})\)
\[\begin{equation} s(\boldsymbol{\theta}) = \frac{\partial l(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = \begin{bmatrix} \frac{l(\boldsymbol{\theta})}{\partial \theta_1}\\ \vdots\\ \frac{l(\boldsymbol{\theta})}{\partial \theta_n} \end{bmatrix} \tag{1.2} \end{equation}\]
Define Observed Information, or Hessian Matrix, as
\[\begin{equation} H(\boldsymbol{\theta}) = \frac{\partial^2 l(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial\boldsymbol{\theta}^T}=\frac{\partial s(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}^T} \tag{1.3} \end{equation}\]
Assume that \(s(\boldsymbol{\theta}_{n+1}) = \mathbf{0}\), linear approximation of \(s(\boldsymbol{\theta}_{n+1}) = \mathbf{0}\) is
\[\begin{equation} s(\boldsymbol{\theta}_{n+1}) =\mathbf{0} \approx s(\boldsymbol{\theta}_{n}) + H(\boldsymbol{\theta}_{n})(\boldsymbol{\theta}_{n+1}-\boldsymbol{\theta}_{n}) \tag{1.4} \end{equation}\]
Equation (1.5) implies that
\[\begin{equation} \boldsymbol{\theta}_{n+1} = \boldsymbol{\theta}_{n} - H\left(\boldsymbol{\theta}_{n}\right)^{-1}s\left(\boldsymbol{\theta}_{n}\right) \tag{1.5} \end{equation}\]
Define Expected Information, as
\[\begin{equation} I(\boldsymbol{\theta})=\mathrm{E}(-H(\boldsymbol{\theta})) \tag{2.1} \end{equation}\]
Replace Observed Information with Expected Information, we have formula for Fisher Scoring
\[\begin{equation} \boldsymbol{\theta}_{n+1}=\boldsymbol{\theta}_{n}+I(\boldsymbol{\theta})^{-1} s\left(\boldsymbol{\theta}_{n}\right) \tag{2.2} \end{equation}\]
Exponential Dispersion Family is defined as
\[\begin{equation} f\left(y_{i} ; \theta_{i}, \phi\right)=\exp \left\{\left[y_{i} \theta_{i}-b\left(\theta_{i}\right)\right] / a(\phi)+c\left(y_{i}, \phi\right)\right\} \tag{3.1} \end{equation}\]
A well known relationship between mean/variance and \(b(\theta)\) is shown below (Agresti 2016)
\[\begin{equation} \mu_{i}=E\left(y_{i}\right)=b^{\prime}\left(\theta_{i}\right) \quad \text { and } \quad \operatorname{var}\left(y_{i}\right)=b^{\prime \prime}\left(\theta_{i}\right) a(\phi) \tag{3.2} \end{equation}\]
By definition of canonical link
\[\begin{equation} \theta_i = \eta_{i} = \sum_{j=1}^{p} \beta_{j} x_{i j} \tag{3.3} \end{equation}\]
Log-likelihood becomes
\[\begin{equation} \ell(\eta_{i}) = \left[y_{i} \eta_{i}-b\left(\eta_{i}\right)\right] / a(\phi)+c\left(y_{i}, \phi\right) \tag{3.4} \end{equation}\]
Observed Information is free of random variable \(y_i\)
\[\begin{equation} \frac{\partial \ell(\eta_{i})}{\partial \beta_j} = \frac{y_i-b^{'}(\eta_{i})}{a(\phi)}x_{ij} \tag{3.5} \end{equation}\]
\[\begin{equation} \frac{\partial^2 \ell(\eta_{i})}{\partial \beta_j \partial \beta_h} = \frac{-b^{''}(\eta_{i})}{a(\phi)}x_{ij}x_{ih} \tag{3.6} \end{equation}\]
Equation (3.6) is free of random variable \(y_i\). Hence, \(-E(\frac{\partial^{2} \ell\left(\eta_{i}\right)}{\partial \beta_{j} \partial \beta_{h}})=-\frac{\partial^{2} \ell\left(\eta_{i}\right)}{\partial \beta_{j} \partial \beta_{h}}\). In other words, Observed Information and Expected Information are same for GLM with canonical link.
Overall, Fisher scoring and Newton-Raphson are identical for GLM using canonical link.(Nelder and Wedderburn 1972)