#Maximum Likelihood Estimation in Logistic Regression (logit link)

##Bernoulli distribution

The pmf of bernoulli distribution is \[\begin{equation} P(Y=y) = p^{y} (1-p) ^{1-y} \tag{1} \end{equation}\] where \(y\) is \(1\) or \(0\).

##Likelihood of Logistic Regression

\[\begin{equation} \begin{split} L(\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x}) &= L(\beta_0,\beta_1|(y_1,...,y_n);(x_1,...,x_n)) \\ &= \prod^{n}_{i=1} p_{i}^{y_i}(1-p_i)^{1-y_i} \end{split} \tag{2} \end{equation}\]

##Log-likelihood of Logistic Regression

\[\begin{equation} \begin{split} \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x}) & = log(\prod^{n}_{i=1} p_{i}^{y_i}(1-p_{i})^{1-y_i}) \\ & = \sum_{i=1}^{n} log(p_{i}^{y_i}) + log((1-p_{i})^{1-y_i}) \\ & = \sum_{i=1}^{n} y_{i} log(p_{i}) + (1-y_{i})log(1-p_{i}) \end{split} \tag{3} \end{equation}\]

0.2 Partial Derivatives of log-likelihood function

Combine equations (3) - (7)

\[\begin{equation} \begin{split} \frac{\partial \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x})}{\partial \beta_0} & = \sum_{i=1}^{n} y_i \frac{\partial log(p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_0} + (1-y_i) \frac{\partial log(1-p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_0}\\ & = \sum_{i=1}^{n}\frac{y_i}{p_i} \frac{\partial p_i}{\partial \beta_0} - \frac{1 - y_i}{1- p_i} \frac{\partial p_i}{\partial \beta_0} \\ & = \sum_{i=1}^{n} y_i - p_i = 0 \end{split} \tag{8} \end{equation}\]

\[\begin{equation} \begin{split} \frac{\partial \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x})}{\partial \beta_1} & = \sum_{i=1}^{n} y_i \frac{\partial log(p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_1} + (1-y_i) \frac{\partial log(1-p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_1}\\ & = \sum_{i=1}^{n}\frac{y_i}{p_i} \frac{\partial p_i}{\partial \beta_1} - \frac{1 - y_i}{1- p_i} \frac{\partial p_i}{\partial \beta_1} \\ & = \sum_{i=1}^{n} x_i (y_i - p_i) = 0 \end{split} \tag{9} \end{equation}\]

#Maximum Likelihood Estimation in Logistic Regression (probit link)

##Normal distribution

We define \(\varphi(x)\) as pdf function of standard normal distribution and define \(\Phi(x)\) as cdf function of standard normal distribution. Here we don’t need to show what exactly \(\varphi(x)\) and \(\Phi(x)\) are.

For this section, only relationship between \(\varphi(x)\) and \(\Phi(x)\) matters.

\[\begin{equation} \begin{split} \varphi(x) = \frac{d\Phi(x)}{dx} \end{split} \tag{10} \end{equation}\]

##Likelihood and loglikelihood of Logistic Regression with probit link

Likelihood and loglikelihood of Logistic Regression with probit link are as same as (2) and (3)

##Probit link

\[\begin{equation} \begin{split} probit(p_i) = \Phi^{-1}(p_i) = \beta_0 + \beta_1 x_i \end{split} \tag{11} \end{equation}\]

Equivalently

\[\begin{equation} \begin{split} p_i = \Phi(\Phi^{-1}(p_i)) = \Phi(\beta_0+\beta_1 x_i) \end{split} \tag{12} \end{equation}\]

Based on equation (10), \(\frac{\partial \Phi(\beta_0 + \beta_1 x_i)}{\partial (\beta_0 + \beta_1 x_i)} = \varphi(\beta_0 + \beta_1 x_i)\). First partial derivative with respect to the variable \(\beta_0\):

\[\begin{equation} \begin{split} \frac{\partial p_i}{\partial \beta_0} = \frac{\partial \Phi(\beta_0+\beta_1 x_i)}{\partial (\beta_0+\beta_1 x_i)} \frac{\partial (\beta_0+\beta_1 x_i)}{\partial \beta_0} = \varphi(\beta_0+\beta_1 x_i) \end{split} \tag{13} \end{equation}\]

Similarly, we have

\[\begin{equation} \begin{split} \frac{\partial p_i}{\partial \beta_1} = \frac{\partial \Phi(\beta_0+\beta_1 x_i)}{\partial (\beta_0+\beta_1 x_i)} \frac{\partial (\beta_0+\beta_1 x_i)}{\partial \beta_1} = \varphi(\beta_0+\beta_1 x_i) x_i \end{split} \tag{14} \end{equation}\]

##Partial Derivatives of log-likelihood function

\[\begin{equation} \begin{split} \frac{\partial \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x})}{\partial \beta_0} & = \sum_{i=1}^{n} y_i \frac{\partial log(p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_0} + (1-y_i) \frac{\partial log(1-p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_0}\\ & = \sum_{i=1}^{n}\frac{y_i}{p_i} \frac{\partial p_i}{\partial \beta_0} - \frac{1 - y_i}{1- p_i} \frac{\partial p_i}{\partial \beta_0} \\ & = \sum_{i=1}^{n} \frac{y_i \varphi(\beta_0 + \beta_1 x_i)}{\Phi(\beta_0 + \beta_1 x_i)} - \frac{(1-y_i)\varphi(\beta_0 + \beta_1 x_i)}{1 - \Phi(\beta_0 + \beta_1 x_i)} \\ & = \sum_{i=1}^{n} \frac{\varphi(\beta_0 + \beta_1 x_i)(y_i - \Phi(\beta_0 + \beta_1 x_i))}{\Phi(\beta_0 + \beta_1 x_i)(1-\Phi(\beta_0+\beta_1 x_i))} \end{split} \tag{15} \end{equation}\]

\[\begin{equation} \begin{split} \frac{\partial \ell (\boldsymbol{\beta} | \mathbf{y} ; \mathbf{x})}{\partial \beta_1} & = \sum_{i=1}^{n} y_i \frac{\partial log(p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_1} + (1-y_i) \frac{\partial log(1-p_i)}{\partial p_i} \frac{\partial p_i}{\partial \beta_1}\\ & = \sum_{i=1}^{n}\frac{y_i}{p_i} \frac{\partial p_i}{\partial \beta_1} - \frac{1 - y_i}{1- p_i} \frac{\partial p_i}{\partial \beta_1} \\ & = \sum_{i=1}^{n} \frac{y_i \varphi(\beta_0 + \beta_1 x_i)x_i}{\Phi(\beta_0 + \beta_1 x_i)} - \frac{(1-y_i)\varphi(\beta_0 + \beta_1 x_i)x_i}{1 - \Phi(\beta_0 + \beta_1 x_i)} \\ & = \sum_{i=1}^{n} \frac{\varphi(\beta_0 + \beta_1 x_i)(y_i - \Phi(\beta_0 + \beta_1 x_i))}{\Phi(\beta_0 + \beta_1 x_i)(1-\Phi(\beta_0+\beta_1 x_i))}x_i \end{split} \tag{16} \end{equation}\]

#Packages and version information

library(tidyverse)
library(cowplot)
library(numDeriv)
sessionInfo()
## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] numDeriv_2016.8-1.1 lattice_0.20-45     optimization_1.0-9 
##  [4] Rcpp_1.0.9          lme4_1.1-31         Matrix_1.5-3       
##  [7] nycflights13_1.0.2  forcats_0.5.1       stringr_1.4.0      
## [10] purrr_0.3.4         readr_2.1.2         tidyr_1.2.0        
## [13] tibble_3.1.8        tidyverse_1.3.2     cowplot_1.1.1      
## [16] dplyr_1.0.9         ggplot2_3.3.6       reticulate_1.28    
## [19] knitr_1.39         
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-157        fs_1.5.2            lubridate_1.8.0    
##  [4] RColorBrewer_1.1-3  httr_1.4.4          rprojroot_2.0.3    
##  [7] tools_4.2.1         backports_1.4.1     bslib_0.4.0        
## [10] utf8_1.2.2          R6_2.5.1            DBI_1.1.3          
## [13] colorspace_2.0-3    withr_2.5.0         tidyselect_1.1.2   
## [16] compiler_4.2.1      cli_3.3.0           rvest_1.0.2        
## [19] xml2_1.3.3          labeling_0.4.2      bookdown_0.28      
## [22] sass_0.4.2          scales_1.2.0        digest_0.6.29      
## [25] minqa_1.2.5         rmarkdown_2.15      pkgconfig_2.0.3    
## [28] htmltools_0.5.3     dbplyr_2.2.1        fastmap_1.1.0      
## [31] highr_0.9           rlang_1.0.4         readxl_1.4.1       
## [34] rstudioapi_0.13     jquerylib_0.1.4     farver_2.1.1       
## [37] generics_0.1.3      jsonlite_1.8.0      googlesheets4_1.0.1
## [40] magrittr_2.0.3      munsell_0.5.0       fansi_1.0.3        
## [43] lifecycle_1.0.1     stringi_1.7.8       yaml_2.3.5         
## [46] MASS_7.3-57         grid_4.2.1          crayon_1.5.1       
## [49] haven_2.5.0         splines_4.2.1       hms_1.1.2          
## [52] pillar_1.8.1        boot_1.3-28         reprex_2.0.2       
## [55] glue_1.6.2          evaluate_0.16       modelr_0.1.8       
## [58] nloptr_2.0.3        png_0.1-7           vctrs_0.4.1        
## [61] tzdb_0.3.0          cellranger_1.1.0    gtable_0.3.0       
## [64] assertthat_0.2.1    cachem_1.0.6        xfun_0.32          
## [67] broom_1.0.0         googledrive_2.0.0   gargle_1.2.0       
## [70] ellipsis_0.3.2      here_1.0.1

#MLE of logistic regression - Three Methods In this section, I will use three methods, Newton-Raphson, Fisher Scoring and IRLS(iteratively reweighted least squares) to estimate \(\beta_0, \beta_1\). The data set used in this section came from execrise 17.1 of Richard M. Heiberger (2015).

References

Agresti (2016) Richard M. Heiberger (2015)

Agresti, Alan. 2016. Foundations of Linear and Generalized Linear Models. First Edition.
Richard M. Heiberger, Burt Holland. 2015. Statistical Analysis and Data Display an Intermediate Course with Examples in r. Second Edition.