<- 12-2-1) (df
[1] 9
qt(0.025, df)
[1] -2.262157
In a small study involving 12 children, the patients’ heights and weights were recorded. A catheter is passed into a artery at the femoral region and pushed up into the heart to obtain information about the heart’s physiology and functional ability. The exact catheter length required was determined. Data are reported in the table below.
Height (in.) | Weight (lb) | Length (cm) | e^2 |
---|---|---|---|
42.8 | 40.0 | 37.0 | 0.0021 |
63.5 | 93.5 | 49.5 | 3.3104 |
37.5 | 35.5 | 34.5 | 0.4175 |
39.5 | 30.0 | 36.0 | 2.2821 |
45.5 | 52.0 | 43.0 | 9.8240 |
38.5 | 17.0 | 28.0 | 14.5329 |
43.0 | 38.5 | 37.0 | 0.0406 |
22.5 | 8.5 | 20.0 | 49.6808 |
37.0 | 33.0 | 33.5 | 1.1468 |
23.5 | 9.5 | 30.5 | 9.3903 |
33.0 | 21.0 | 38.5 | 49.0623 |
58.0 | 79.0 | 47.0 | 0.2232 |
\(\sum_i e_i^2 = 139.913\) |
Consider the following linear model to explain the dependence of catheter length on height and weight:
\[\mbox{length}_i = \beta_1 + \beta_2 \mbox{height}_i + \beta_3 \mbox{weight}_i + \epsilon_i\]
where the \(\epsilon_i\) are independent and Gaussian distributed as \(\epsilon_i \sim N(0, \sigma^2)\). Squared values of the residuals \(e_i\) from the model are in the table, together with their sum. For the model, \(\boldsymbol{X}\) denotes the design matrix and we have \[(\boldsymbol{X'X})^{-1} = \begin{pmatrix} 4.926 & -0.197 & 0.082\\ -0.197 & 0.008 & -0.004\\ 0.082 & -0.004 & 0.002 \end{pmatrix} \]
The least squares estimates are \(\hat \beta_1 = 21.008\), \(\hat \beta_2 = 0.196\) and \(\hat \beta_3 = 0.191\).
State the formula for computing the unbiased estimator for \(\sigma^2\). Then compute its value.
Compute the standard errors for the estimates of \(\beta_2\) and \(\beta_3\). Then construct confidence intervals for \(\beta_2\) and \(\beta_3\) using \(\alpha = 0.05\). What do you conclude?
Since \(\hat \beta = (X'X)^{-1}X' Y\) and \(Y \sim N(X\beta, \sigma^2 I)\),
\[\hat \beta \sim N(\beta, \sigma^2(X'X)^{-1})\]
\[\frac{\hat \beta_j - \beta_j}{se(\hat \beta_j)}= \frac{\hat \beta_j - \beta_j}{\sqrt{\hat \sigma^2 {{(X'X)}_{jj}^{-1}}}} \sim t_{n-p-1}\]
An unbiased estimator for \(\sigma^2\) is
\[\hat \sigma^2 = \sum_{i=1}^n \frac{(y_i-\hat y_i)^2}{n-p-1} = \sum_{i=1}^n \frac{e_i^2}{n-2-1} = \frac{139.913}{12-3}=15.547\]
Here, \(n = 12\) and the sum of residuals is given in the table \(\sum_i e_i^2 = 139.913\).
\[se(\hat \beta_j) = \hat \sigma \sqrt{{(X'X)}_{jj}^{-1}},\ \ j=1,2,3\]
\[\hat \sigma = \sqrt{\hat \sigma^2} = \sqrt{15.547} = 3.943\]
Hence by taking the square root of the second and third element on the diagonal of \({{(X'X)}_{jj}^{-1}}\), we have
\[se(\hat \beta_2)= 3.943 \times \sqrt{0.008} = 0.353\] \[se(\hat \beta_3) = 3.943 \times \sqrt{0.002} = 0.176\]
Confidence intervals (CI) are given by
\[\hat \beta_j \pm t_{\alpha/2, n-3}\ se(\hat \beta_j)\]
95% CI for \(\beta_2\) is \((0.196 \pm 2.262 \times 0.353) = (-0.602, 0.994)\)
95% CI for \(\beta_3\) is \((0.191 \pm 2.262 \times 0.176) = (-0.207, 0.589)\)
<- 12-2-1) (df
[1] 9
qt(0.025, df)
[1] -2.262157
Both intervals include zero. The model does not consider the effect of height and weight jointly significant in explaining the response. It might be that one or the other are separately relevant, but not when they are jointly in the model.
\[\mbox{length}_i = \beta_1 + \beta_2 \mbox{height}_i + \beta_3 \mbox{weight}_i + \epsilon_i\]
We conduct an F test to test the hypothesis
\(H_0: \beta_3 = 0\) (reduced model has intercept and height)
\(H_1: \beta_3 \neq 0\) (full model has intercept, height and also the weight covariate)
\[F=\frac{(RSS(reduced) - RSS(full))/(p_1-p_0)}{RSS(full)/(n-p_1-1)} \sim F_{p_1-p_0,\ n-p_1-1}\]
We have RSS(reduced) = 160.665 and RSS(full) = 139.913.
\[F_{obs} = \frac{ ( \mbox{RSS(reduced)} - \mbox{RSS(full) )}/(p_1-p_0)}{\mbox{RSS(full)}/(n-p_1-1)} = \frac{(160.665-139.913)/(2-1)}{139.913/(12-2-1)} = 1.335\]
p-value = P(F \(\geq\) F\(_{observed}\)) > \(\alpha\) = 0.05
<- 1.335
F 1 - pf(F, 1, 9)) (
[1] 0.27767
We have \(F_{obs} = 1.335\) smaller than the critical value, quantile \(F_{0.95, 1, 9} = 5.117\).
<- qf(0.95, 1, 9)) (cr
[1] 5.117355
Then, we fail to reject the null hypothesis \(H_0: \beta_3=0\).
We conclude there is not sufficient evidence against the null hypothesis. That is, we do not need the information carried by weight, when the height is already in the model.
Consider the regression model
\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 (X_i^2-5)/4+\epsilon_i,\ i=1,\ldots,4, \] where \(\epsilon_1,\ldots,\epsilon_4\) are i.i.d. with \(E[\epsilon_i]=0\), \(Var[\epsilon_i]=\sigma^2\).
Suppose \(\boldsymbol{X}=(X_1,\ldots,X_4)' = (-3,-1,1,3)'\) and \(\boldsymbol{Y}=(Y_1,\ldots,Y_4)'=(1,2,2,4)'\).
Find the BLUE \(\boldsymbol{\hat \beta}=(\hat \beta_0, \hat \beta_1, \hat \beta_2)'\).
Based on this model, obtain a \(95\%\) CI for \(Y\) when \(X=0\). Hint: MSE = 0.45.
\[\boldsymbol{\hat{\beta}} = \boldsymbol{(X'X)^{-1} X' Y}\]
\[\boldsymbol{X}=\begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & -1\\ 1 & 1 & -1\\ 1 & 3 & 1\\ \end{pmatrix}\]
\[\boldsymbol{X'X}=\begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & -1\\ 1 & 1 & -1\\ 1 & 3 & 1\\ \end{pmatrix}' \begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & -1\\ 1 & 1 & -1\\ 1 & 3 & 1\\ \end{pmatrix}= \begin{pmatrix} 1 & 1 & 1 & 1\\ -3 & -1 & 1 & 3\\ 1 & -1 & -1 & 1\\ \end{pmatrix} \begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & -1\\ 1 & 1 & -1\\ 1 & 3 & 1\\ \end{pmatrix}= \begin{pmatrix} 4 &0 & 0\\ 0 &20 & 0\\ 0 & 0 & 4\\ \end{pmatrix} = 4 \begin{pmatrix} 1 & 0 & 0\\ 0 & 5 & 0\\ 0 & 0 & 1\\ \end{pmatrix}\\ \]
\[ \boldsymbol{(X'X)^{-1}}=\frac{1}{4} \begin{pmatrix} 1 & 0 & 0\\ 0 & 1/5 & 0\\ 0 & 0 & 1\\ \end{pmatrix}\\ \]
\[ \boldsymbol{X'Y}=\begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & 1\\ 1 & 1 & -1\\ 1 & 3 & -1\\ \end{pmatrix}' \begin{pmatrix} 1 \\ 2 \\ 2 \\ 4\\ \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 & 1\\ -3 & -1 & 1 & 3\\ 1 & -1 & -1 & 1\\ \end{pmatrix} \begin{pmatrix} 1 \\ 2 \\ 2 \\ 4\\ \end{pmatrix}= \begin{pmatrix} 9 \\ 9 \\ 1 \\ \end{pmatrix}\\ \]
\[
\boldsymbol{\hat \beta}= \boldsymbol{(X'X)^{-1} X'Y} = \frac{1}{4} \begin{pmatrix}
1 & 0 & 0\\
0 & 1/5 & 0\\
0 & 0 & 1\\
\end{pmatrix} \begin{pmatrix}
9 \\
9 \\
1 \\
\end{pmatrix} =\begin{pmatrix}
9/4 \\
9/20 \\
1/4 \\
\end{pmatrix}
\]
\((1-\alpha)100\%\) confidence interval:
\[\hat Y_0 \pm t_{n-p-1, \alpha/2} \times se(\hat Y_0),\] where
\(\hat Y_0 = \boldsymbol{X_0' \hat \beta}\)
\(Var(\hat Y_0) = \boldsymbol{X'_0} Var(\boldsymbol{\hat \beta}) \boldsymbol{X_0} = \sigma^2 \boldsymbol{X_0' (X'X)^{-1} X_0}\)
\(se(\hat Y_0) = \sqrt{MSE\ \boldsymbol{X_0'(X'X)^{-1} X_0}}\)
\[\mbox{When } X = 0,\ \ \boldsymbol{X_0'}=\left(1,0,\frac{-5}{4}\right)\]
\[\hat Y_0 = \boldsymbol{X_0' \hat \beta} = 1\left(\frac{9}{4}\right)+0\left(\frac{9}{20}\right)- \frac{5}{4}\left(\frac{1}{4}\right)=\frac{31}{16} \]
\[\boldsymbol{X_0' (X'X)^{-1} X_0} = (1,0,-5/4) \left(\begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & 1\\ 1 & 1 & -1\\ 1 & 3 & -1\\ \end{pmatrix}' \begin{pmatrix} 1 &-3 & 1\\ 1 &-1 & 1\\ 1 & 1 & -1\\ 1 & 3 & -1\\ \end{pmatrix}\right)^{-1} \begin{pmatrix} 1 \\ 0 \\ -5/4 \\ \end{pmatrix}= 1/4 \times 41/16\]
\[\mbox{The } 95\% \mbox{ CI is }\ \frac{31}{16} \pm \underbrace{t_{(n-p-1),1-\frac{\alpha}{2}}}_{=12.71}\sqrt{0.45\left(\frac{1}{4}\right)\left(\frac{41}{16}\right)}=\frac{31}{16} \pm 6.82 = (-4.89,8.76) \]
# n = 4, p = 2, n-p-1 = 1
qt(0.025, 1)
[1] -12.7062