where \(y\) is \(1\) or \(0\).
Obviously, \(p_i\) is
\[\begin{equation} \begin{split} p_i &= \frac{exp(\beta_0+x_1 \beta_1)}{1 + exp(\beta_0+x_1 \beta_1)} \end{split} \tag{1.5} \end{equation}\]First partial derivative with respect to the variable \(\beta_0\)
\[\begin{equation} \begin{split} \frac{\partial p_i}{\partial \beta_0} &= \frac{exp(\beta_0+x_1 \beta_1)}{(1+exp(\beta_0+x_1 \beta_1))^2}\\ &= p_i (1-p_i) \end{split} \tag{1.6} \end{equation}\]Similiarly, we have
\[\begin{equation} \begin{split} \frac{\partial p_i}{\partial \beta_1} &= \frac{x_1 exp(\beta_0+x_1 \beta_1)}{(1+exp(\beta_0+x_1 \beta_1))^2}\\ &= x_1 p_i (1-p_i) \end{split} \tag{1.7} \end{equation}\]Combine equations (1.3) - (1.7)
\[\begin{equation} \begin{split} \frac{\partial \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x})}{\partial \beta_0} & = \sum_{i=1}^{n} y_i \frac{\partial log(p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_0} + (1-y_i) \frac{\partial log(1-p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_0}\\ & = \sum_{i=1}^{n}\frac{y_i}{p_i} \frac{\partial p_i}{\partial \beta_0} - \frac{1 - y_i}{1- p_i} \frac{\partial p_i}{\partial \beta_0} \\ & = \sum_{i=1}^{n} y_i - p_i = 0 \end{split} \tag{1.8} \end{equation}\] \[\begin{equation} \begin{split} \frac{\partial \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x})}{\partial \beta_1} & = \sum_{i=1}^{n} y_i \frac{\partial log(p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_1} + (1-y_i) \frac{\partial log(1-p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_1}\\ & = \sum_{i=1}^{n}\frac{y_i}{p_i} \frac{\partial p_i}{\partial \beta_1} - \frac{1 - y_i}{1- p_i} \frac{\partial p_i}{\partial \beta_1} \\ & = \sum_{i=1}^{n} x_i (y_i - p_i) = 0 \end{split} \tag{1.9} \end{equation}\]We define \(\varphi(x)\) as pdf function of standard normal distribution and define \(\Phi(x)\) as cdf function of standard normal distribution. Here we don’t need to show what exactly \(\varphi(x)\) and \(\Phi(x)\) are.
For this section, only relationship between \(\varphi(x)\) and \(\Phi(x)\) matters.
\[\begin{equation} \begin{split} \varphi(x) = \frac{d\Phi(x)}{dx} \end{split} \tag{2.1} \end{equation}\]Likelihood and loglikelihood of Logistic Regression with probit link are as same as (1.2) and (1.3)
Equivalently
\[\begin{equation} \begin{split} p_i = \Phi(\Phi^{-1}(p_i)) = \Phi(\beta_0+\beta_1 x_i) \end{split} \tag{2.3} \end{equation}\]Based on equation (2.1), \(\frac{\partial \Phi(\beta_0 + \beta_1 x_i)}{\partial (\beta_0 + \beta_1 x_i)} = \varphi(\beta_0 + \beta_1 x_i)\). First partial derivative with respect to the variable \(\beta_0\):
\[\begin{equation} \begin{split} \frac{\partial p_i}{\partial \beta_0} = \frac{\partial \Phi(\beta_0+\beta_1 x_i)}{\partial (\beta_0+\beta_1 x_i)} \frac{\partial (\beta_0+\beta_1 x_i)}{\partial \beta_0} = \varphi(\beta_0+\beta_1 x_i) \end{split} \tag{2.4} \end{equation}\]Similarly, we have
\[\begin{equation} \begin{split} \frac{\partial p_i}{\partial \beta_1} = \frac{\partial \Phi(\beta_0+\beta_1 x_i)}{\partial (\beta_0+\beta_1 x_i)} \frac{\partial (\beta_0+\beta_1 x_i)}{\partial \beta_1} = \varphi(\beta_0+\beta_1 x_i) x_i \end{split} \tag{2.5} \end{equation}\]library(tidyverse)
library(cowplot)
library(numDeriv)
sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] numDeriv_2016.8-1 forcats_0.3.0 stringr_1.3.0
## [4] purrr_0.2.4 readr_1.1.1 tibble_1.4.2
## [7] tidyverse_1.2.1 optimization_1.0-7 Rcpp_0.12.16
## [10] lme4_1.1-17 Matrix_1.2-14 tidyr_0.8.0
## [13] cowplot_0.9.2 dplyr_0.7.4 ggplot2_2.2.1
##
## loaded via a namespace (and not attached):
## [1] lubridate_1.7.4 lattice_0.20-35 utf8_1.1.3
## [4] assertthat_0.2.0 rprojroot_1.3-2 digest_0.6.15
## [7] psych_1.8.4 R6_2.2.2 cellranger_1.1.0
## [10] plyr_1.8.4 backports_1.1.2 evaluate_0.10.1
## [13] httr_1.3.1 highr_0.6 pillar_1.2.1
## [16] rlang_0.2.0 lazyeval_0.2.1 readxl_1.1.0
## [19] minqa_1.2.4 rstudioapi_0.7 nloptr_1.0.4
## [22] rmarkdown_1.9 labeling_0.3 splines_3.4.4
## [25] foreign_0.8-69 munsell_0.4.3 broom_0.4.4
## [28] compiler_3.4.4 modelr_0.1.1 xfun_0.1
## [31] pkgconfig_2.0.1 mnormt_1.5-5 htmltools_0.3.6
## [34] tidyselect_0.2.4 bookdown_0.7 crayon_1.3.4
## [37] MASS_7.3-49 grid_3.4.4 nlme_3.1-137
## [40] jsonlite_1.5 gtable_0.2.0 magrittr_1.5
## [43] scales_0.5.0 cli_1.0.0 stringi_1.1.7
## [46] reshape2_1.4.3 bindrcpp_0.2.2 xml2_1.2.0
## [49] RColorBrewer_1.1-2 tools_3.4.4 glue_1.2.0
## [52] hms_0.4.2 parallel_3.4.4 yaml_2.1.18
## [55] colorspace_1.3-2 rvest_0.3.2 knitr_1.20
## [58] bindr_0.1.1 haven_1.1.1
In this section, I will use three methods, Newton-Raphson, Fisher Scoring and IRLS(iteratively reweighted least squares) to estimate \(\beta_0, \beta_1\). The data set used in this section came from execrise 17.1 of Richard M. Heiberger (2015).
Newton-Raphson’s equation is
\[\begin{equation} \begin{split} \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} - (\boldsymbol{H}^{(t)})^{-1}\boldsymbol{u}^{(t)} \end{split} \tag{4.1} \end{equation}\]Where
\(\boldsymbol{\beta}^{(t)} = \begin{bmatrix} \beta_{0}^{(t)}\\ \beta_{1}^{(t)} \end{bmatrix}\)
From equation (1.8) and (1.9), we have
\(\boldsymbol{u}^{(t)} = \begin{bmatrix} u_{0}^{(t)}\\ u_{1}^{(t)} \end{bmatrix} = \begin{bmatrix} \frac{\partial \boldsymbol{\ell} (\boldsymbol(\beta)^{(t)}|\boldsymbol{y};\boldsymbol{x})}{\partial \beta_0}\\ \frac{\partial \boldsymbol{\ell} (\boldsymbol(\beta)^{(t)}|\boldsymbol{y};\boldsymbol{x})}{\partial \beta_1} \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{n} y_i - p_i^{(t)}\\ \sum_{i=1}^{n} x_i(y_i - p_i^{(t)}) \end{bmatrix}\)
From equation (1.3), we have
\(\ell (\boldsymbol{\beta}^{(t)} | \mathbf{y} ; \mathbf{x}) = \sum_{i=1}^{n} y_{i} log(p_{i}^{(t)}) + (1-y_{i})log(1-p_{i}^{(t)})\)
From equation (1.5), we have
\(p_{i}^{(t)} = \frac{exp(\beta_{0}^{(t)}+x_1 \beta_{1}^{(t)})}{1 + exp(\beta_{0}^{(t)}+x_{1} \beta_{1}^{(t)})}\)
\(\boldsymbol{H}^{(t)}\) can be considered as Jacobian matrix of \(\boldsymbol{u}(\cdot)\),
\(\boldsymbol{H}^{(t)} = \begin{bmatrix} \frac{\partial u_{0}^{(t)}}{\partial \beta_0} & \frac{\partial u_{0}^{(t)}}{\partial \beta_1}\\ \frac{\partial u_{1}^{(t)}}{\partial \beta_0} & \frac{\partial u_{1}^{(t)}}{\partial \beta_1} \end{bmatrix}\)
Last, \(\boldsymbol{H}^{(t)}\) can also be considered as Hessian matrix of \(\boldsymbol{\ell}(\cdot)\),
\(\boldsymbol{H}^{(t)} = \begin{bmatrix} \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_0^2} & \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_0 \beta_1} \\ \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_0 \beta_1} & \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_1^2} \end{bmatrix}\)
##Create a simple data frame
df1 <- tibble(xx = 1:4,
y = c(1,0,1,0))
df1
## # A tibble: 4 x 2
## xx y
## <int> <dbl>
## 1 1 1.
## 2 2 0.
## 3 3 1.
## 4 4 0.
Using equations (1.8) and (1.9), we can get same coefficients’ estimation via Newton-Raphson Method.
func.u <- function(x) c(sum(df1$y - exp(x[1] + x[2] * df1$xx)/ (1 + exp(x[1] + x[2] * df1$xx))),
sum(df1$xx * (df1$y - exp(x[1] + x[2] * df1$xx)/ (1 + exp(x[1] + x[2] * df1$xx)))))
delta <- matrix(1:2,nrow = 2)
x <- array(c(2,-1))
while(abs(sum(delta)) > 0.0001){
xx <- x ##current x value
x <- as.matrix(x) - solve(jacobian(func.u, x = x)) %*% func.u(x) ##new x value
delta <- x - as.matrix(xx)
}
x
## [,1]
## [1,] 2.2704607
## [2,] -0.9081843
Fisher Socring’s equation is
\[\begin{equation} \begin{split} \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + (\boldsymbol{J}^{(t)})^{-1}\boldsymbol{u}^{(t)} \end{split} \tag{4.2} \end{equation}\]Where
\(\boldsymbol{J}^{(t)} = \boldsymbol{X}^{T} \boldsymbol{W}^{(t)} \boldsymbol{X}\), \(\boldsymbol{X}\) is design matrix.
\(\boldsymbol{W}^{(t)} = \text{diag} ((\frac{\partial \mu_i^{(t)}}{\partial (\beta_0 + x_i\beta_1)})^2 \frac{1}{\text{var}(y_i)})\)
\(\frac{\partial \mu_i^{(t)}}{\partial (\beta_0 + x_i\beta_1)} = \frac{exp(\beta_0 + x_i\beta_1)^{(t)}}{(1+exp(\beta_0 + x_i\beta_1)^{(t)})^2}\)
From eq (1.5), we have
\(\text{var}(y_i) = p_i(1-p_i) = \frac{exp(\beta_0 + x_i\beta_1)^{(t)}}{(1+exp(\beta_0 + x_i\beta_1)^{(t)})^2}\)
So
\((\frac{\partial \mu_i^{(t)}}{\partial (\beta_0 + x_i\beta_1)})^2 \frac{1}{\text{var}(y_i)} = \frac{exp(\beta_0 + x_i\beta_1)^{(t)}}{(1+exp(\beta_0 + x_i\beta_1)^{(t)})^2}\)
Eventually, we have
\(\boldsymbol{W}^{(t)} = \text{diag}(\frac{exp(\beta_0 + x_i\beta_1)}{(1+exp(\beta_0 + x_i\beta_1))^2})^{(t)}\)
X <- cbind(rep(1,4),df1$xx)
X <- as.matrix(X)
func1 <- function(x) {
x <- X %*% matrix(c(x[1],x[2]), nrow = 2)
exp(x)/(1+exp(x))^2
}
model1 <- function(x) c(sum(df1$y - exp(x[1] + x[2] * df1$xx)/ (1 + exp(x[1] + x[2] * df1$xx))),
sum(df1$xx * (df1$y - exp(x[1] + x[2] * df1$xx)/ (1 + exp(x[1] + x[2] * df1$xx)))))
delta <- matrix(1:2,nrow = 2)
x <- array(c(2,-1))
while (abs(sum(delta)) > 0.0001) {
xx <- x ##current x value
W <- diag(array(func1(x = x)))
J <- t(X) %*% W %*% X
x <- as.matrix(x) + solve(J) %*% model1(x)
delta <- x - as.matrix(xx)
}
x
## [,1]
## [1,] 2.2704607
## [2,] -0.9081843
Iteratively reweighted least squares’ equation is
\[\begin{equation} \begin{split} \boldsymbol{\beta}^{(t+1)} = (\boldsymbol{J}^{(t)})^{-1}\boldsymbol{X}^{T}\boldsymbol{W}^{(t)}\boldsymbol{Z}^{(t)} \end{split} \tag{4.3} \end{equation}\]Where
\(\boldsymbol{Z}^{(t)} = \boldsymbol{X} \boldsymbol{\beta}^{(t)} + (\boldsymbol{D}^{(t)})^{-1}(\boldsymbol{y} - \boldsymbol{p}^{(t)})\)
\(\boldsymbol{D}^{(t)} = \text{ diag }(\frac{\partial \mu_i ^{(t)}}{\partial (\beta_0 + \beta_1x_i)}) = \text{diag}(\frac{exp(\beta_0 + x_i\beta_1)}{(1+exp(\beta_0 + x_i\beta_1))^2})^{(t)}\)
X <- cbind(rep(1,4),df1$xx)
X <- as.matrix(X) #design matrix X
func.mu <- function(x) {
upeta <- X %*% matrix(c(x[1],x[2]), nrow = 2)
return(exp(upeta)/(1+exp(upeta)))
}
func.W <- function(x) {
upeta <- X %*% matrix(c(x[1],x[2]), nrow = 2)
return(exp(upeta)/(1+exp(upeta))^2)
}
func.D <- function(x) {
upeta <- X %*% matrix(c(x[1],x[2]), nrow = 2)
return(exp(upeta)/(1+exp(upeta))^2)
}
delta <- matrix(1:2,nrow = 2)
x <- array(c(2,-1)) #inital values for beta0 and beta1
while (abs(sum(delta)) > 0.0001) {
xx <- x ##current x value
W <- diag(array(func.W(x = x)))
J <- t(X) %*% W %*% X
D <- diag(array(func.D(x = x)))
z <- X %*% x + solve(D) %*% (df1$y - func.mu(x))
x <- solve(J) %*% t(X) %*% W %*% z
delta <- x - as.matrix(xx)
}
x
## [,1]
## [1,] 2.2704607
## [2,] -0.9081843
Probit link Newton-Raphson equation looks as same as (4.1)
\[\begin{equation} \begin{split} \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} - (\boldsymbol{H}^{(t)})^{-1}\boldsymbol{u}^{(t)} \end{split} \tag{4.4} \end{equation}\]However, \(\boldsymbol{u}^{(t)}\) is acutally different so is \(\boldsymbol{H}^{(t)}\).
\(\boldsymbol{u}^{(t)} = \begin{bmatrix} \frac{\partial \boldsymbol{\ell} (\boldsymbol(\beta)^{(t)}|\boldsymbol{y};\boldsymbol{x})}{\partial \beta_0}\\ \frac{\partial \boldsymbol{\ell} (\boldsymbol(\beta)^{(t)}|\boldsymbol{y};\boldsymbol{x})}{\partial \beta_1} \end{bmatrix}\), \(\frac{\partial \boldsymbol{\ell} (\cdot) }{\partial \beta_0}\) and \(\frac{\partial \boldsymbol{\ell} (\cdot) }{\partial \beta_1}\) are as same as equations (2.6) and (2.7).
\(\boldsymbol{H}^{(t)} = \begin{bmatrix} \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_0^2} & \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_0 \beta_1} \\ \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_0 \beta_1} & \frac{\partial^2 \boldsymbol{\ell}(\boldsymbol{\beta}^{(t)} | \boldsymbol{y};\boldsymbol{x})}{\partial \beta_1^2} \end{bmatrix}\), \(\boldsymbol{H}^{(t)}\) are determined by \(\boldsymbol{u}^{(t)}\), hence will be different from equations from MLE-logit section.
model2 <- function(x) c(sum((dnorm(x[1]+x[2] * df1$xx))*(df1$y - pnorm(x[1]+x[2] * df1$xx))/
((pnorm(x[1]+x[2] * df1$xx))*(1-pnorm(x[1]+x[2] * df1$xx)))),
sum((dnorm(x[1]+x[2] * df1$xx))*(df1$y - pnorm(x[1]+x[2] * df1$xx))/
((pnorm(x[1]+x[2] * df1$xx))*(1-pnorm(x[1]+x[2] * df1$xx)))*df1$xx)
)
delta <- matrix(1:2,nrow = 2)
x <- array(c(2,-2))
while(abs(sum(delta)) > 0.0001){
xx <- x ##current x value
x <- as.matrix(x) - solve(jacobian(model2, x = x)) %*% model2(x) ##new x value
delta <- x - as.matrix(xx)
}
x
## [,1]
## [1,] 1.477002
## [2,] -0.590801
Fisher Socring’s equation is
\[\begin{equation} \begin{split} \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + (\boldsymbol{J}^{(t)})^{-1}\boldsymbol{u}^{(t)} \end{split} \tag{4.5} \end{equation}\]Where \(\boldsymbol{J}^{(t)} = \boldsymbol{X}^{T} \boldsymbol{W}^{(t)} \boldsymbol{X}\)
\(\boldsymbol{W}^{(t)} = \text{ diag }((\frac{\partial \mu_i^{(t)}}{\partial (\beta_0 + x_i\beta_1)})^2 \frac{1}{\text{var}(y_i)})\)
\(\frac{\partial \mu_i}{\partial \beta_0 + x_1\beta_1} = \frac{\partial p_i}{\partial \beta_0 + x_1\beta_1} = \varphi(\beta_0 + x_1\beta_1)\)
\(\text{Var}(y_i) = p_i (1-p_i) = \Phi(\beta_0 + x_1\beta_1)(1-\Phi(\beta_0 + x_1\beta_1))\)
\(\boldsymbol{W}^{(t)} = \text{ diag }(\frac{\varphi^2(\beta_0 + x_1\beta_1)}{\Phi(\beta_0 + x_1\beta_1)(1-\Phi(\beta_0 + x_1\beta_1))})^{(t)}\)
X <- cbind(rep(1,4),df1$xx)
X <- as.matrix(X)
func2 <- function(x) {
x <- X %*% matrix(c(x[1],x[2]), nrow = 2)
dnorm(x)^2/(pnorm(x)*(1-pnorm(x)))
}
model2 <- function(x) c(sum((dnorm(x[1]+x[2] * df1$xx))*(df1$y - pnorm(x[1]+x[2] * df1$xx))/
((pnorm(x[1]+x[2] * df1$xx))*(1-pnorm(x[1]+x[2] * df1$xx)))),
sum((dnorm(x[1]+x[2] * df1$xx))*(df1$y - pnorm(x[1]+x[2] * df1$xx))/
((pnorm(x[1]+x[2] * df1$xx))*(1-pnorm(x[1]+x[2] * df1$xx)))*df1$xx)
)
delta <- matrix(1:2,nrow = 2)
x <- array(c(2,-1))
while (abs(sum(delta)) > 0.0001) {
xx <- x ##current x value
W <- diag(array(func2(x = x)))
J <- t(X) %*% W %*% X
x <- as.matrix(x) + solve(J) %*% model2(x)
delta <- x - as.matrix(xx)
}
x
## [,1]
## [1,] 1.4769988
## [2,] -0.5907995
Iteratively reweighted least squares’ equation is
\[\begin{equation} \begin{split} \boldsymbol{\beta}^{(t+1)} = (\boldsymbol{J}^{(t)})^{-1}\boldsymbol{X}^{T}\boldsymbol{W}^{(t)}\boldsymbol{Z}^{(t)} \end{split} \tag{4.6} \end{equation}\]Where
\(\boldsymbol{Z}^{(t)} = \boldsymbol{X} \boldsymbol{\beta}^{(t)} + (\boldsymbol{D}^{(t)})^{-1}(\boldsymbol{y} - \boldsymbol{p}^{(t)})\)
\(\boldsymbol{D}^{(t)} = \text{ diag }(\frac{\partial \mu_i ^{(t)}}{\partial (\beta_0 + \beta_1x_i)}) = \text{diag }(\varphi(\beta_0 + \beta_1x_i))^{(t)}\)
df1 <- data.frame(xx = 1:4,
y = c(1,0,1,0))
X <- cbind(rep(1,4),df1$xx)
X <- as.matrix(X) #design matrix X
func.mu <- function(x) {
upeta <- X %*% matrix(c(x[1],x[2]), nrow = 2)
return(pnorm(upeta))
}
func.W <- function(x) {
upeta <- X %*% matrix(c(x[1],x[2]), nrow = 2)
return((dnorm(upeta))^2/(pnorm(upeta)*(1-pnorm(upeta))))
}
func.D <- function(x) {
upeta <- X %*% matrix(c(x[1],x[2]), nrow = 2)
return(dnorm(upeta))
}
delta <- matrix(1:2,nrow = 2)
x <- array(c(2,-1)) #inital values for beta0 and beta1
while (abs(sum(delta)) > 0.0001) {
xx <- x ##current x value
W <- diag(array(func.W(x = x)))
J <- t(X) %*% W %*% X
D <- diag(array(func.D(x = x)))
z <- X %*% x + solve(D) %*% (df1$y - func.mu(x))
x <- solve(J) %*% t(X) %*% W %*% z
delta <- x - as.matrix(xx)
}
x
## [,1]
## [1,] 1.4769988
## [2,] -0.5907995
glm
outputUsing stats::glm
function we have identical estimations comparing from MLE-logit link via Newton-Raphson to MLE-logit link via IRLS.
However, for probit link, estimations we have from MLE-probit link via Newton-Raphson to MLE-probit link via IRLS are slightly different from estimations via stats::glm
function. There must be hiden secret that I don’t know. It will be inteseting to find out how exactly stats::glm
do estimation for logistic regression.
##Logit logistic regression
glm1 <- glm(y ~ xx, data=df1, family=binomial)
glm1$coefficients
## (Intercept) xx
## 2.2704607 -0.9081843
##Build probit logistic regression
glm2 <- glm(y ~ xx, data=df1, family=binomial(link = "probit"))
glm2$coefficients
## (Intercept) xx
## 1.4769772 -0.5907909