8 Confidence intervals and hypothesis testing

8.1 Confidence interval for a regression coefficient \(\beta_j\)

\[\boldsymbol{\hat \beta} \sim N(\boldsymbol{\beta}, \sigma^2 \boldsymbol{(X'X)^{-1}})\]

\(Var(\boldsymbol{\hat \beta}) = \sigma^2 \boldsymbol{(X'X)^{-1}}\)

\(\hat Var(\boldsymbol{\hat \beta}) = \hat \sigma^2 \boldsymbol{(X'X)^{-1}} = MSE \boldsymbol{(X'X)^{-1}}\)

\(\hat Var(\hat \beta_j)\) is the \(jth\) diagonal element of matrix \(\hat Var(\boldsymbol{\hat \beta})\)

\(se(\hat \beta_j) = \sqrt{\hat \sigma^2 C_{jj}}\), where \(C_{jj}\) is the \(j\)th diagonal element of \(\boldsymbol{(X'X)^{-1}}\).

\[\frac{\hat \beta_j - \beta_j}{se(\hat \beta_j)} = \frac{\hat \beta_j - \beta_j}{\sqrt{\hat \sigma^2 C_{jj}}} \sim t_{n-p-1}\]

\((1-\alpha)100\%\) confidence interval for \(\beta_j\): \[\hat \beta_j \pm t_{n-p-1,\alpha/2} \ \ se(\hat \beta_j) = \hat \beta_j \pm t_{n-p-1,\alpha/2}\ \ \sqrt{\hat \sigma^2 C_{jj}}\]

8.2 Hypothesis test for a regression coefficient \(\beta_j\)

To test \(H_0: \beta_j = 0\) vs \(H_1: \beta_j \neq 0\) we form the standardized coefficient:

\[t_j = \frac{\hat \beta_j - \beta_j}{se(\hat \beta_j)} = \frac{\hat \beta_j - \beta_j}{\sqrt{\hat \sigma^2 C_{jj}}} \sim t_{n-p-1}\]

Under the null hypothesis that \(\beta_j=0\),

\[t_j = \frac{\hat \beta_j}{se(\hat \beta_j)} \sim t_{n-p-1}\]

Hence, a large (absolute) value of \(t_j\) will lead to rejection of this null hypothesis. If \(\hat \sigma\) is replaced by a known value \(\sigma\), then \(t_j\) would have a standard normal distribution. The difference between the tail quantiles of a t-distribution and a standard normal become negligible as the sample size increases, and so we typically use the normal quantiles.

8.3 Hypothesis test for a group of coefficients

To test for the significance of groups of coefficients simultaneously we use the F-test.

Model M1 with \(p_1\) predictors nested in model M2 with \(p_2\) predictors, \(p_1 < p_2\).

\(H_0: \beta_{p_1+1} = \ldots = \beta_{p_2} = 0\) (M1 model fits as well as M2 model) vs.

\(H_1: \beta_{j} \neq 0\) for any \(p_1 < j \leq p_2\) (M2 fits better than M1)

Under the null hypothesis that the smaller model is correct,

\[F=\frac{(RSS(reduced) - RSS(full))/(p_2-p_1)}{RSS(full)/(n-p_2-1)} \sim F_{p_2-p_1,\ n-p_2-1}\]

where \(RSS(full)\) is the residual sum-of-squares for the least squares fit of the bigger model with \(p_2+1\) parameters, and \(RSS(reduced)\) the same for the nested smaller model with \(p_1 + 1\) parameters.

The F statistic measures the change in residual sum-of-squares per additional parameter in the bigger model, and it is normalized by an estimate of \(\sigma^2\) (MSE).

8.4 Confidence and prediction intervals

8.4.1 Confidence interval for the mean response

Point estimate and confidence interval for the mean response when the predictor variable values are \(\boldsymbol{X_0} = (1, X_{01}, \ldots, X_{0p})'\), \(E(Y|\boldsymbol{X_0}) = \boldsymbol{X'_0} \boldsymbol{\beta}\).

Fitted or predicted value of the response when the predictor values are \(\boldsymbol{X_0}\) \[\hat Y_0 = \boldsymbol{X'_0 \hat \beta}\]

\[Var(\hat Y_0) = \boldsymbol{X'_0} Var(\boldsymbol{\hat \beta}) \boldsymbol{X_0} = \sigma^2 \boldsymbol{X'_0 (X'X)^{-1} X_0}\] \[se(\hat Y_0) = \sqrt{MSE\ \boldsymbol{X_0'(X'X)^{-1}X_0}}\]

\((1-\alpha)100\%\) confidence interval:

\[\mbox{sample estimate} \pm \mbox{t-multiplier} \times \mbox{standard error}\]

\[\hat Y_0 \pm t_{n-(p+1), \alpha/2} \times se(\hat Y_0)\] For 95% CI t-multiplier is \(t_{n-(p+1), 0.025}\)

8.4.2 Prediction interval for a new response

Prediction interval for a new observation \(Y_{new}\) when the predictor variable values are \(\boldsymbol{X_0} = (1, X_{01}, \ldots, X_{0p})'\), \(Y_{new} = \boldsymbol{X'_0 \beta} + \epsilon\)

\[\hat Y_{new} = \boldsymbol{X'_0 \hat \beta}\]

\[Var(\boldsymbol{X'_0 \hat \beta } + \epsilon) = Var(\hat Y_0 + \epsilon) = Var(\hat Y_0) + Var(\epsilon) = \sigma^2 (1 + \boldsymbol{X'_0 (X'X)^{-1} X_0})\]

\((1-\alpha)100\%\) prediction interval

\[\hat Y_0 \pm t_{n-(p+1), \alpha/2} \times \sqrt{MSE + se(\hat Y_0)^2}\]

8.4.3 Comments

Because the prediction interval has the extra MSE term, a confidence interval for the mean response will always be narrower than the corresponding prediction interval for a new response.

Factors affecting the width of the confidence interval for the mean response

As the mean square error (MSE) decreases, the width of the interval decreases. Since MSE is an estimate of how much the data vary naturally around the unknown population regression hyperplane, we have little control over MSE.
As we decrease the confidence level, the t-multiplier decreases, and hence the width of the interval decreases.
As we increase the sample size \(n\), the width of the interval decreases. We have complete control over the size of our sample - the only limitation being our time and financial constraints.
The closer \(\boldsymbol{X_0}\) is to the average of the sample’s predictor values, the narrower the interval.

Factors affecting the width of the confidence interval for the new response

Because the formulas of confidence and prediction intervals are so similar, it turns out that the factors affecting the width of the prediction interval are identical to the factors affecting the width of the confidence interval.