# 1 GLMs

## 1.1 GLMs Poisson

Whyte, et al 1987 (Dobson, 1990) reported the number of deaths due to AIDS in Australia per 3 month period from January 1983 to June 1986. The data are explanatory variable $$X$$: time (measured in multiple of 3 month after January 1983), and response variable $$Y$$: number of deaths in Australia due to AIDS.

x <- 1:14
y <- c(0, 1, 2, 3, 1, 4, 9, 18, 23, 31, 20, 25, 37, 45)
cbind(x, y)
##        x  y
##  [1,]  1  0
##  [2,]  2  1
##  [3,]  3  2
##  [4,]  4  3
##  [5,]  5  1
##  [6,]  6  4
##  [7,]  7  9
##  [8,]  8 18
##  [9,]  9 23
## [10,] 10 31
## [11,] 11 20
## [12,] 12 25
## [13,] 13 37
## [14,] 14 45
• Fit a Poisson regression model using time as explanatory variable
• Interpret the coefficient estimate of time
• Test if time has an effect on the number of death due to AIDS

Solution

res <- glm(y ~ x, family = "poisson")
summary(res)
##
## Call:
## glm(formula = y ~ x, family = "poisson")
##
## Deviance Residuals:
##      Min        1Q    Median        3Q       Max
## -2.21008  -1.02032  -0.69704   0.04028   2.70758
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.33963    0.25119   1.352    0.176
## x            0.25652    0.02204  11.639   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
##     Null deviance: 207.272  on 13  degrees of freedom
## Residual deviance:  29.654  on 12  degrees of freedom
## AIC: 86.581
##
## Number of Fisher Scoring iterations: 5

Coefficients

The coefficient table shows the parameter estimates, their standard errors and test statistic for testing if the coefficient is zero. The estimated equation is $log(\mu) = \beta_0 + \beta_1 x$ $log(\mu^*) = \beta_0 + \beta_1 (x+1)$ $log(\mu^*) - log(\mu) = \beta_1$ $\mu^*/\mu = exp(\beta_1)$

$\hat \mu = exp(0.340 + 0.257 x)$ The number of deaths due to AIDS in a period of time was in average exp(.257) = 1.29 times higher than in the previous time period.

Hypothesis test for individual coefficient

Hypotheses: $$H_0: \beta_1 = 0$$ versus $$H_1: \beta_1 \neq 0$$

Choose significance level $$\alpha = 0.05$$ so the probability of incorrectly rejecting the null when it is true is 5%.

Assumptions: Random sample, large sample size.

Test statistic: Wald Z $$z_{obs} = 11.639$$. $$Z \sim N(0, 1)$$

P-value: Probability of observing a test statistic value of 11.639 or more extreme (in both directions) in the null distribution.

$$P(Z > |z_{\mbox{obs}}|)$$

z <- 11.639
2*(1-pnorm(z))
## [1] 0

p-value < 0.05. Reject the null hypothesis $$H_0$$. At the significance level 5%, the data provide sufficient evidence that the time has an effect on the number of deaths due to AIDS in Australia

## 1.2 GLMs Binomial

The following R output reports the result of an experiment for pesticide which attempted to kill beetles. Beetles were exposed to gaseous carbon disulphide at various concentrations (in mg/L) for five hours and the number of beetles killed were noted.

Call:
glm(formula = cbind(Killed, Survived) ~ Dose, family = binomial)
> summary(modl)
Coefficients:
Estimate  Std. Error  z value  Pr(>|z|)
(Intercept) -14.82300     1.28959   -11.49    <2e-16 ***
Dose          0.24942     0.02139    11.66    <2e-16 ***
Null deviance:   284.2024 on 7 degrees of freedom
Residual deviance: 7.3849 on 6 degrees of freedom
> vcov(modl)
(Intercept)          Dose
(Intercept)   1.66302995 -0.0274363845
Dose         -0.02743638  0.0004573762

• Write down the fitted model.
• Perform a hypothesis test to determine the significance of Dose. You need to state the hypothesis, state the calculated test statistic, state the reference distribution under the null hypothesis, set a decision rule and state the conclusion.
• Estimate the odds ratio as well as its significance when Dose increases 4 units. Interpret the odds ratio.
• Provide the 95% confidence interval for the probability of killing when the value of Dose equals 62.

Solution

• Let $$Y_i$$ be the number of beetles killed, $$n_i$$ be the total number of beetles, and $$x_i$$ the corresponding dose value for each observation $$i$$. The model is

$Y_i \sim Bin(n_i, p_i),$ $\mbox{logit}(p_i) = \log \left(\frac{p_i}{1-p_i} \right)= \beta_0 + \beta_1 x_i$

The fitted model is

$\mbox{logit}(p_i) = \log \left(\frac{p_i}{1-p_i} \right)= -14.823 + 0.24942 x_i$

• We test $$H_0: \beta_1 = 0$$ versus $$H_1: \beta_1 \neq 0$$. The test statistic is $$\hat \beta_1/SE(\hat \beta_1) = 0.24942/0.02139=11.66$$. The test statistic distribution under the null hypothesis is N(0,1). The p-value based on N(0,1) distribution is less than $$0.05$$. Therefore, we reject the null hypothesis, Dose is significant.

$logit(p) = \beta_0 + \beta_1 x$

If we increase variable Dose ($$x$$) by 4 units

$logit(p^*) = \beta_0 + \beta_1 (x+4)$

$logit(p^*) - logit(p) = \beta_1 \times 4$

$\log\left(\frac{\frac{p^*}{1-p^*}}{\frac{p}{1-p}} \right) = \beta_1 \times 4$

The odds ratio when Dose increases 4 units is

$OR = \frac{\frac{p^*}{1-p^*}}{\frac{p}{1-p}} = exp(\beta_1 \times 4) = exp(0.24942 \times 4) = 2.7120$

The interpretation is if the Dose increases 4 units, the odds for beetle to be killed are multiplied by 2.71.

• Let $$\eta$$ be the value of the linear component when Dose equals to 62. Then, $\hat \eta = \boldsymbol{x}' \boldsymbol{\hat \beta} = \boldsymbol{x}' (\hat \beta_0, \hat \beta_1)' = (1, 62) (\hat \beta_0, \hat \beta_1)' = -14.823 + 0.24942 \times 62 = 0.6410$

Its variance is

$Var({\hat \eta}) = Var(\boldsymbol{x}' \boldsymbol{\hat \beta}) = \boldsymbol{x}' Var(\boldsymbol{\hat \beta}) \boldsymbol{x} = (1, 62) \begin{pmatrix} 1.66303 & -0.027436\\ -0.027436 & 0.00045738 \end{pmatrix} \begin{pmatrix} 1 \\ 62 \end{pmatrix} = 0.019145$

Then, the 95% confidence interval for $$\eta$$ is $0.6410 \pm 1.96 \sqrt{0.019145} = [0.3699, 0.9121].$ Therefore, the 95% confidence interval for the probability of killing is $\left(\frac{e^{0.3699}}{1+e^{0.3699}} , \frac{e^{0.9121}}{1+e^{0.9121}} \right) = (0.5914, 0.7134).$