Solutions: https://www.paulamoraga.com/course-aramco/99-problems-4regression-solutions.html

1 Linear regression

1.1 Baby weights, Part I

The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is coded 1 if the mother is a smoker, and 0 if not. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, based on the smoking status of the mother.

Estimate Std. Error t value Pr(\(\geq \mid t\mid\))
(Intercept) 123.05 0.65 189.60 0.0000
smoke -8.94 1.03 -8.65 0.0000
  1. Write the equation of the regression model.

  2. Interpret the slope in this context, and calculate the predicted birth weight of babies born to smoker and non-smoker mothers.

  3. Is there a statistically significant relationship between the average birth weight and smoking?

1.2 Baby weights, Part II

The previous exercise Part I introduces a data set on birth weight of babies. Another variable we consider is parity, which is 1 if the child is the first born, and 0 otherwise. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, from parity.

Estimate Std. Error t value Pr(\(\geq \mid t\mid\))
(Intercept) 120.07 0.60 199.94 0.0000
parity -1.93 1.19 -1.62 0.1052
  1. Write the equation of the regression model.

  2. Interpret the slope in this context, and calculate the predicted birth weight of first borns and others.

  3. Is there a statistically significant relationship between the average birth weight and parity?

1.3 Baby weights, Part III

We considered the variables smoke and parity, one at a time, in modeling birth weights of babies in previous exercises Part I and II. A more realistic approach to modeling infant weights is to consider all possibly related variables at once. Other variables of interest include length of pregnancy in days (gestation), mother’s age in years (age), mother’s height in inches (height), and mother’s pregnancy weight in pounds (weight).

Use the data babies.csv (LINK) to answer the following questions.

  1. Write the equation of the regression model that relates birth weights of babies (bwt) to variables gestation, parity, age, height, weight, and smoke.
  2. Interpret the slopes of gestation, age and parity in this context.
  3. The coefficient for parity is different than in the linear model shown in exercise Part II. Why might there be a difference?
  4. Calculate the residual for the first observation in the data set.
  5. Interpret the adjusted \(R^2\).