8 Confidence intervals and hypothesis testing
8.1 Confidence interval for a regression coefficient \(\beta_j\)
\[\boldsymbol{\hat \beta} \sim N(\boldsymbol{\beta}, \sigma^2 \boldsymbol{(X'X)^{-1}})\]
\(Var(\boldsymbol{\hat \beta}) = \sigma^2 \boldsymbol{(X'X)^{-1}}\)
\(\hat Var(\boldsymbol{\hat \beta}) = \hat \sigma^2 \boldsymbol{(X'X)^{-1}} = MSE \boldsymbol{(X'X)^{-1}}\)
\(\hat Var(\hat \beta_j)\) is the \(jth\) diagonal element of matrix \(\hat Var(\boldsymbol{\hat \beta})\)
\(se(\hat \beta_j) = \sqrt{\hat \sigma^2 C_{jj}}\), where \(C_{jj}\) is the \(j\)th diagonal element of \(\boldsymbol{(X'X)^{-1}}\).
\[\frac{\hat \beta_j - \beta_j}{se(\hat \beta_j)} = \frac{\hat \beta_j - \beta_j}{\sqrt{\hat \sigma^2 C_{jj}}} \sim t_{n-p-1}\]
\((1-\alpha)100\%\) confidence interval for \(\beta_j\): \[\hat \beta_j \pm t_{n-p-1,\alpha/2} \ \ se(\hat \beta_j) = \hat \beta_j \pm t_{n-p-1,\alpha/2}\ \ \sqrt{\hat \sigma^2 C_{jj}}\]
8.2 Hypothesis test for a regression coefficient \(\beta_j\)
To test \(H_0: \beta_j = 0\) vs \(H_1: \beta_j \neq 0\) we form the standardized coefficient:
\[t_j = \frac{\hat \beta_j - \beta_j}{se(\hat \beta_j)} = \frac{\hat \beta_j - \beta_j}{\sqrt{\hat \sigma^2 C_{jj}}} \sim t_{n-p-1}\]
Under the null hypothesis that \(\beta_j=0\),
\[t_j = \frac{\hat \beta_j}{se(\hat \beta_j)} \sim t_{n-p-1}\]
Hence, a large (absolute) value of \(t_j\) will lead to rejection of this null hypothesis. If \(\hat \sigma\) is replaced by a known value \(\sigma\), then \(t_j\) would have a standard normal distribution. The difference between the tail quantiles of a t-distribution and a standard normal become negligible as the sample size increases, and so we typically use the normal quantiles.
8.3 Hypothesis test for a group of coefficients
To test for the significance of groups of coefficients simultaneously we use the F-test.
Model M1 with \(p_1\) predictors nested in model M2 with \(p_2\) predictors, \(p_1 < p_2\).
\(H_0: \beta_{p_1+1} = \ldots = \beta_{p_2} = 0\) (M1 model fits as well as M2 model) vs.
\(H_1: \beta_{j} \neq 0\) for any \(p_1 < j \leq p_2\) (M2 fits better than M1)
Under the null hypothesis that the smaller model is correct,
\[F=\frac{(RSS(reduced) - RSS(full))/(p_2-p_1)}{RSS(full)/(n-p_2-1)} \sim F_{p_2-p_1,\ n-p_2-1}\]
where \(RSS(full)\) is the residual sum-of-squares for the least squares fit of the bigger model with \(p_2+1\) parameters, and \(RSS(reduced)\) the same for the nested smaller model with \(p_1 + 1\) parameters.
The F statistic measures the change in residual sum-of-squares per additional parameter in the bigger model, and it is normalized by an estimate of \(\sigma^2\) (MSE).
8.4 Confidence and prediction intervals
8.4.1 Confidence interval for the mean response
Point estimate and confidence interval for the mean response when the predictor variable values are \(\boldsymbol{X_0} = (1, X_{01}, \ldots, X_{0p})'\), \(E(Y|\boldsymbol{X_0}) = \boldsymbol{X'_0} \boldsymbol{\beta}\).
Fitted or predicted value of the response when the predictor values are \(\boldsymbol{X_0}\) \[\hat Y_0 = \boldsymbol{X'_0 \hat \beta}\]
\[Var(\hat Y_0) = \boldsymbol{X'_0} Var(\boldsymbol{\hat \beta}) \boldsymbol{X_0} = \sigma^2 \boldsymbol{X'_0 (X'X)^{-1} X_0}\] \[se(\hat Y_0) = \sqrt{MSE\ \boldsymbol{X_0'(X'X)^{-1}X_0}}\]
\((1-\alpha)100\%\) confidence interval:
\[\mbox{sample estimate} \pm \mbox{t-multiplier} \times \mbox{standard error}\]
\[\hat Y_0 \pm t_{n-(p+1), \alpha/2} \times se(\hat Y_0)\] For 95% CI t-multiplier is \(t_{n-(p+1), 0.025}\)
8.4.2 Prediction interval for a new response
Prediction interval for a new observation \(Y_{new}\) when the predictor variable values are \(\boldsymbol{X_0} = (1, X_{01}, \ldots, X_{0p})'\), \(Y_{new} = \boldsymbol{X'_0 \beta} + \epsilon\)
\[\hat Y_{new} = \boldsymbol{X'_0 \hat \beta}\]
\[Var(\boldsymbol{X'_0 \hat \beta } + \epsilon) = Var(\hat Y_0 + \epsilon) = Var(\hat Y_0) + Var(\epsilon) = \sigma^2 (1 + \boldsymbol{X'_0 (X'X)^{-1} X_0})\]
\((1-\alpha)100\%\) prediction interval
\[\hat Y_0 \pm t_{n-(p+1), \alpha/2} \times \sqrt{MSE + se(\hat Y_0)^2}\]
8.4.3 Comments
Because the prediction interval has the extra MSE term, a confidence interval for the mean response will always be narrower than the corresponding prediction interval for a new response.
Factors affecting the width of the confidence interval for the mean response
Factors affecting the width of the confidence interval for the new response