Solutions: https://www.paulamoraga.com/course-aramco/99-problems-5glm-solutions.html

Whyte, et al 1987 (Dobson, 1990) reported the number of deaths due to AIDS in Australia per 3 month period from January 1983 to June 1986. The data are explanatory variable \(X\): time (measured in multiple of 3 month after January 1983), and response variable \(Y\): number of deaths in Australia due to AIDS.

```
<- 1:14
x <- c(0, 1, 2, 3, 1, 4, 9, 18, 23, 31, 20, 25, 37, 45)
y cbind(x, y)
```

```
## x y
## [1,] 1 0
## [2,] 2 1
## [3,] 3 2
## [4,] 4 3
## [5,] 5 1
## [6,] 6 4
## [7,] 7 9
## [8,] 8 18
## [9,] 9 23
## [10,] 10 31
## [11,] 11 20
## [12,] 12 25
## [13,] 13 37
## [14,] 14 45
```

- Fit a Poisson regression model using time as explanatory variable
- Interpret the coefficient estimate of time
- Test if time has an effect on the number of death due to AIDS

**Solution**

```
<- glm(y ~ x, family = "poisson")
res summary(res)
```

```
##
## Call:
## glm(formula = y ~ x, family = "poisson")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.21008 -1.02032 -0.69704 0.04028 2.70758
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.33963 0.25119 1.352 0.176
## x 0.25652 0.02204 11.639 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 207.272 on 13 degrees of freedom
## Residual deviance: 29.654 on 12 degrees of freedom
## AIC: 86.581
##
## Number of Fisher Scoring iterations: 5
```

**Coefficients**

The coefficient table shows the parameter estimates, their standard errors and test statistic for testing if the coefficient is zero. The estimated equation is \[log(\mu) = \beta_0 + \beta_1 x\] \[log(\mu^*) = \beta_0 + \beta_1 (x+1)\] \[log(\mu^*) - log(\mu) = \beta_1\] \[\mu^*/\mu = exp(\beta_1)\]

\[\hat \mu = exp(0.340 + 0.257 x)\] The number of deaths due to AIDS in a period of time was in average exp(.257) = 1.29 times higher than in the previous time period.

**Hypothesis test for individual coefficient**

Hypotheses: \(H_0: \beta_1 = 0\) versus \(H_1: \beta_1 \neq 0\)

Choose significance level \(\alpha = 0.05\) so the probability of incorrectly rejecting the null when it is true is 5%.

Assumptions: Random sample, large sample size.

Test statistic: Wald Z \(z_{obs} = 11.639\). \(Z \sim N(0, 1)\)

P-value: Probability of observing a test statistic value of 11.639 or more extreme (in both directions) in the null distribution.

\(P(Z > |z_{\mbox{obs}}|)\)

```
<- 11.639
z 2*(1-pnorm(z))
```

`## [1] 0`

p-value < 0.05. Reject the null hypothesis \(H_0\). At the significance level 5%, the data provide sufficient evidence that the time has an effect on the number of deaths due to AIDS in Australia

The following R output reports the result of an experiment for pesticide which attempted to kill beetles. Beetles were exposed to gaseous carbon disulphide at various concentrations (in mg/L) for five hours and the number of beetles killed were noted.

```
:
Callglm(formula = cbind(Killed, Survived) ~ Dose, family = binomial)
> summary(modl)
:
CoefficientsPr(>|z|)
Estimate Std. Error z value -14.82300 1.28959 -11.49 <2e-16 ***
(Intercept) 0.24942 0.02139 11.66 <2e-16 ***
Dose : 284.2024 on 7 degrees of freedom
Null deviance: 7.3849 on 6 degrees of freedom
Residual deviance> vcov(modl)
(Intercept) Dose1.66302995 -0.0274363845
(Intercept) -0.02743638 0.0004573762
Dose
```

- Write down the fitted model.
- Perform a hypothesis test to determine the significance of Dose. You need to state the hypothesis, state the calculated test statistic, state the reference distribution under the null hypothesis, set a decision rule and state the conclusion.
- Estimate the odds ratio as well as its significance when Dose increases 4 units. Interpret the odds ratio.
- Provide the 95% confidence interval for the probability of killing when the value of Dose equals 62.

**Solution**

- Let \(Y_i\) be the number of beetles killed, \(n_i\) be the total number of beetles, and \(x_i\) the corresponding dose value for each observation \(i\). The model is

\[Y_i \sim Bin(n_i, p_i),\] \[\mbox{logit}(p_i) = \log \left(\frac{p_i}{1-p_i} \right)= \beta_0 + \beta_1 x_i\]

The fitted model is

\[\mbox{logit}(p_i) = \log \left(\frac{p_i}{1-p_i} \right)= -14.823 + 0.24942 x_i\]

We test \(H_0: \beta_1 = 0\) versus \(H_1: \beta_1 \neq 0\). The test statistic is \(\hat \beta_1/SE(\hat \beta_1) = 0.24942/0.02139=11.66\). The test statistic distribution under the null hypothesis is N(0,1). The p-value based on N(0,1) distribution is less than \(0.05\). Therefore, we reject the null hypothesis, Dose is significant.

\[logit(p) = \beta_0 + \beta_1 x \]

If we increase variable Dose (\(x\)) by 4 units

\[logit(p^*) = \beta_0 + \beta_1 (x+4) \]

\[logit(p^*) - logit(p) = \beta_1 \times 4\]

\[\log\left(\frac{\frac{p^*}{1-p^*}}{\frac{p}{1-p}} \right) = \beta_1 \times 4\]

The odds ratio when Dose increases 4 units is

\[OR = \frac{\frac{p^*}{1-p^*}}{\frac{p}{1-p}} = exp(\beta_1 \times 4) = exp(0.24942 \times 4) = 2.7120\]

The interpretation is if the Dose increases 4 units, the odds for beetle to be killed are multiplied by 2.71.

- Let \(\eta\) be the value of the linear component when Dose equals to 62. Then, \[\hat \eta = \boldsymbol{x}' \boldsymbol{\hat \beta} = \boldsymbol{x}' (\hat \beta_0, \hat \beta_1)' = (1, 62) (\hat \beta_0, \hat \beta_1)' = -14.823 + 0.24942 \times 62 = 0.6410\]

Its variance is

\[Var({\hat \eta}) = Var(\boldsymbol{x}' \boldsymbol{\hat \beta}) = \boldsymbol{x}' Var(\boldsymbol{\hat \beta}) \boldsymbol{x} = (1, 62) \begin{pmatrix} 1.66303 & -0.027436\\ -0.027436 & 0.00045738 \end{pmatrix} \begin{pmatrix} 1 \\ 62 \end{pmatrix} = 0.019145\]

Then, the 95% confidence interval for \(\eta\) is \[0.6410 \pm 1.96 \sqrt{0.019145} = [0.3699, 0.9121].\] Therefore, the 95% confidence interval for the probability of killing is \[\left(\frac{e^{0.3699}}{1+e^{0.3699}} , \frac{e^{0.9121}}{1+e^{0.9121}} \right) = (0.5914, 0.7134).\]