17  Example code MLR

17.1 Patient satisfaction

d <- read.table("data/patsat.txt", header = TRUE)
res <- lm(Satisfaction ~ Age + Severity + Anxiety, data = d)
summary(res)

Call:
lm(formula = Satisfaction ~ Age + Severity + Anxiety, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-16.954  -7.154   1.550   6.599  14.888 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 162.8759    25.7757   6.319 4.59e-06 ***
Age          -1.2103     0.3015  -4.015  0.00074 ***
Severity     -0.6659     0.8210  -0.811  0.42736    
Anxiety      -8.6130    12.2413  -0.704  0.49021    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.29 on 19 degrees of freedom
Multiple R-squared:  0.6727,    Adjusted R-squared:  0.621 
F-statistic: 13.01 on 3 and 19 DF,  p-value: 7.482e-05

17.2 Checking assumptions linear model

Independence

Random sample

Normality

To check normality we look at the normal Q-Q plot of the residuals The data show departure from normality because the plot does not resemble a straight line. This departure could be due to the few observations. There are only 23 observations.

plot(res, 2)

Constant variance and Linearity

Constant variance. The plots suggest that the variance of \(y\), \(\sigma^2\), is the same for all values of \(x\) since we do not see any structure in the residuals.

Linearity. Residual versus variables plots are randomly scattered. This indicates that assuming the effects of age, severity and anxiety level are linear is OK.

par(mfrow = c(2, 2))
plot(d$Age, rstandard(res), main = "Standardized residuals versus Age")
abline(h = 0, lty = 2)
plot(d$Severity, rstandard(res), main = "Standardized residuals versus Severity")
abline(h = 0, lty = 2)
plot(d$Anxiety, rstandard(res), main = "Standardized residuals versus Anxiety")
abline(h = 0, lty = 2)

17.3 Unusual observations

Outliers (unusual \(y\)). Standardized residuals less than -2 or greater than 2.

# Outliers
op <- which(abs(rstandard(res)) > 2)
op
named integer(0)

High leverage observations (unusual \(x\)). Leverage greater than 2 times the average of all leverage values. That is, \(h_{ii} > 2(p+1)/n\).

# High leverage observations
h <- hatvalues(res)
lp <- which(h > 2*length(res$coefficients)/nrow(d))
lp
named integer(0)

Influential values. Cook’s distance higher than \(4/n\).

# Influential observations
ip <- which(cooks.distance(res) > 4/nrow(d))
ip
11 13 15 17 
11 13 15 17 
plot(res, 4)

d[unique(c(lp, op, ip)), ]
   Satisfaction Age Severity Anxiety
11           89  29       48     2.4
13           47  38       55     2.2
15           57  53       54     2.2
17           79  33       56     2.5

There are not outliers or high leverage observations. Observations 11, 13, 15 and 17 have large Cook’s distance but it is not that different from the rest.