Solutions: https://www.paulamoraga.com/course-aramco/99-problems-5glm-solutions.html

1 GLMs

1.1 GLMs Poisson

Whyte, et al 1987 (Dobson, 1990) reported the number of deaths due to AIDS in Australia per 3 month period from January 1983 to June 1986. The data are explanatory variable \(X\): time (measured in multiple of 3 month after January 1983), and response variable \(Y\): number of deaths in Australia due to AIDS.

x <- 1:14
y <- c(0, 1, 2, 3, 1, 4, 9, 18, 23, 31, 20, 25, 37, 45)
cbind(x, y)
##        x  y
##  [1,]  1  0
##  [2,]  2  1
##  [3,]  3  2
##  [4,]  4  3
##  [5,]  5  1
##  [6,]  6  4
##  [7,]  7  9
##  [8,]  8 18
##  [9,]  9 23
## [10,] 10 31
## [11,] 11 20
## [12,] 12 25
## [13,] 13 37
## [14,] 14 45
  • Fit a Poisson regression model using time as explanatory variable
  • Interpret the coefficient estimate of time
  • Test if time has an effect on the number of death due to AIDS

Solution

res <- glm(y ~ x, family = "poisson")
summary(res)
## 
## Call:
## glm(formula = y ~ x, family = "poisson")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.21008  -1.02032  -0.69704   0.04028   2.70758  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.33963    0.25119   1.352    0.176    
## x            0.25652    0.02204  11.639   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 207.272  on 13  degrees of freedom
## Residual deviance:  29.654  on 12  degrees of freedom
## AIC: 86.581
## 
## Number of Fisher Scoring iterations: 5

Coefficients

The coefficient table shows the parameter estimates, their standard errors and test statistic for testing if the coefficient is zero. The estimated equation is \[log(\mu) = \beta_0 + \beta_1 x\] \[log(\mu^*) = \beta_0 + \beta_1 (x+1)\] \[log(\mu^*) - log(\mu) = \beta_1\] \[\mu^*/\mu = exp(\beta_1)\]

\[\hat \mu = exp(0.340 + 0.257 x)\] The number of deaths due to AIDS in a period of time was in average exp(.257) = 1.29 times higher than in the previous time period.

Hypothesis test for individual coefficient

Hypotheses: \(H_0: \beta_1 = 0\) versus \(H_1: \beta_1 \neq 0\)

Choose significance level \(\alpha = 0.05\) so the probability of incorrectly rejecting the null when it is true is 5%.

Assumptions: Random sample, large sample size.

Test statistic: Wald Z \(z_{obs} = 11.639\). \(Z \sim N(0, 1)\)

P-value: Probability of observing a test statistic value of 11.639 or more extreme (in both directions) in the null distribution.

\(P(Z > |z_{\mbox{obs}}|)\)

z <- 11.639
2*(1-pnorm(z))
## [1] 0

p-value < 0.05. Reject the null hypothesis \(H_0\). At the significance level 5%, the data provide sufficient evidence that the time has an effect on the number of deaths due to AIDS in Australia

1.2 GLMs Binomial

The following R output reports the result of an experiment for pesticide which attempted to kill beetles. Beetles were exposed to gaseous carbon disulphide at various concentrations (in mg/L) for five hours and the number of beetles killed were noted.

Call:
glm(formula = cbind(Killed, Survived) ~ Dose, family = binomial)
> summary(modl)
Coefficients:
             Estimate  Std. Error  z value  Pr(>|z|)
(Intercept) -14.82300     1.28959   -11.49    <2e-16 ***
Dose          0.24942     0.02139    11.66    <2e-16 ***
Null deviance:   284.2024 on 7 degrees of freedom
Residual deviance: 7.3849 on 6 degrees of freedom
> vcov(modl)
             (Intercept)          Dose
(Intercept)   1.66302995 -0.0274363845
Dose         -0.02743638  0.0004573762
  • Write down the fitted model.
  • Perform a hypothesis test to determine the significance of Dose. You need to state the hypothesis, state the calculated test statistic, state the reference distribution under the null hypothesis, set a decision rule and state the conclusion.
  • Estimate the odds ratio as well as its significance when Dose increases 4 units. Interpret the odds ratio.
  • Provide the 95% confidence interval for the probability of killing when the value of Dose equals 62.

Solution

  • Let \(Y_i\) be the number of beetles killed, \(n_i\) be the total number of beetles, and \(x_i\) the corresponding dose value for each observation \(i\). The model is

\[Y_i \sim Bin(n_i, p_i),\] \[\mbox{logit}(p_i) = \log \left(\frac{p_i}{1-p_i} \right)= \beta_0 + \beta_1 x_i\]

The fitted model is

\[\mbox{logit}(p_i) = \log \left(\frac{p_i}{1-p_i} \right)= -14.823 + 0.24942 x_i\]

  • We test \(H_0: \beta_1 = 0\) versus \(H_1: \beta_1 \neq 0\). The test statistic is \(\hat \beta_1/SE(\hat \beta_1) = 0.24942/0.02139=11.66\). The test statistic distribution under the null hypothesis is N(0,1). The p-value based on N(0,1) distribution is less than \(0.05\). Therefore, we reject the null hypothesis, Dose is significant.

\[logit(p) = \beta_0 + \beta_1 x \]

If we increase variable Dose (\(x\)) by 4 units

\[logit(p^*) = \beta_0 + \beta_1 (x+4) \]

\[logit(p^*) - logit(p) = \beta_1 \times 4\]

\[\log\left(\frac{\frac{p^*}{1-p^*}}{\frac{p}{1-p}} \right) = \beta_1 \times 4\]

The odds ratio when Dose increases 4 units is

\[OR = \frac{\frac{p^*}{1-p^*}}{\frac{p}{1-p}} = exp(\beta_1 \times 4) = exp(0.24942 \times 4) = 2.7120\]

The interpretation is if the Dose increases 4 units, the odds for beetle to be killed are multiplied by 2.71.

  • Let \(\eta\) be the value of the linear component when Dose equals to 62. Then, \[\hat \eta = \boldsymbol{x}' \boldsymbol{\hat \beta} = \boldsymbol{x}' (\hat \beta_0, \hat \beta_1)' = (1, 62) (\hat \beta_0, \hat \beta_1)' = -14.823 + 0.24942 \times 62 = 0.6410\]

Its variance is

\[Var({\hat \eta}) = Var(\boldsymbol{x}' \boldsymbol{\hat \beta}) = \boldsymbol{x}' Var(\boldsymbol{\hat \beta}) \boldsymbol{x} = (1, 62) \begin{pmatrix} 1.66303 & -0.027436\\ -0.027436 & 0.00045738 \end{pmatrix} \begin{pmatrix} 1 \\ 62 \end{pmatrix} = 0.019145\]

Then, the 95% confidence interval for \(\eta\) is \[0.6410 \pm 1.96 \sqrt{0.019145} = [0.3699, 0.9121].\] Therefore, the 95% confidence interval for the probability of killing is \[\left(\frac{e^{0.3699}}{1+e^{0.3699}} , \frac{e^{0.9121}}{1+e^{0.9121}} \right) = (0.5914, 0.7134).\]