Solutions: https://www.paulamoraga.com/course-aramco/99-problems-2ciht-solutions.html
In 2013, a Research Foundation reported that “45% of U.S. adults report that they live with one or more chronic conditions”. However, this value was based on a sample, so it may not be a perfect estimate for the population parameter of interest on its own. The study reported a standard error of about 1.2%, and a normal model may reasonably be used in this setting. Create a 95% confidence interval for the proportion of U.S. adults who live with one or more chronic conditions. Also interpret the confidence interval in the context of the study.
Solution
\(\hat p \pm z^* \times SE = \hat p \pm z^* \times \sqrt{\hat p(1-\hat p)/n} = 0.45 \pm 1.96 \times 0.012 = (0.42648, 0.47352)\)
Assumptions: Observations are independent and normal
\(\hat p = 0.45\)
\(z^* = 1.96\), the number such that \(\alpha/2=0.05/2\) of the probability in the \(N(0,1)\) is below \(z^*\), (qnorm(0.025)
)
c(0.45-1.96*0.012, 0.45+1.96*0.012)
We are 95% confident that the proportion of U.S. adults who live with one or more chronic conditions is between 42.64% and 47.35%.
A website is trying to increase registration for first-time visitors, exposing 1% of these visitors to a new site design. Of 752 randomly sampled visitors over a month who saw the new design, 64 registered.
Solution
\(\hat p \pm z^* \times SE = \hat p \pm z^* \times \sqrt{\hat p(1-\hat p)/n} = 0.085 \pm 1.64 \times \sqrt{0.085(1-0.085)/752} = (0.0686, 0.1014)\)
\(\hat p = 64/752 = 0.085\)
Assumptions:
SE: sqrt(0.085*(1-0.085)/752)
= 0.01
\(z^* = -1.64\), the number such that \(\alpha/2=0.10/2\) of the probability in the \(N(0,1)\) is below \(z^*\), (qnorm(0.05)
= -1.64)
c(0.085-1.64*0.01, 0.085+1.64*0.01)
We are 90% confident that the fraction of first-time visitors of the site who would register under the new design is between 6.8% and 10.1%.
Do a majority of US adults believe raising the minimum wage will help the economy, or is there a majority who do not believe this? A survey of 1,000 US adults found that 42% believe it will help the economy. Conduct an appropriate hypothesis test to help answer the research question.
Solution
\(H_0: p = 0.50\)
\(H_1: p \neq 0.50\)
(\(p_0=0.50\))
\(\alpha = 0.05\)
\[Z = \frac{\hat P - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \sim N(0,1)\]
Assumptions: Independent observations (random sample) and number of expected successes (\(n \times 0.42 = 420\)) and failures (\(n \times (1-0.42) = 580\)) are both greater than 10
<- 1000
n <- 0.42
p_hat <- 0.5
p0 <- (p_hat-p0)/sqrt((p0*(1-p0))/n)
z z
## [1] -5.059644
Probability of observing a test statistic value of -5.06 or more extreme (in both directions) in the null distribution.
2*pnorm(z)
## [1] 4.200394e-07
p-value < 0.05, we reject the null. We conclude that the fraction of US adults who believe raising the minimum wage will help the economy is not 50%. Because the observed value is less than 50% and we have rejected the null hypothesis, we can conclude that this belief is held by fewer than 50% of US adults.
Results of a poll evaluating support for drilling for oil and natural gas off the coast of California are below.
College Grad Yes | College Grad No | |
---|---|---|
Support | 154 | 132 |
Oppose | 180 | 126 |
Do not know | 104 | 131 |
Total | 438 | 389 |
Solution
Percent of college graduates that support drilling: \(\hat p = 154/438=0.35\)
Percent of non-college graduates that support drilling: \(\hat p = 132/389=0.34\)
\(H_0: p_c = p_{nc}\)
\(H_1: p_c \neq p_{nc}\)
or
\(H_0: p_c - p_{nc} = 0\)
\(H_1: p_c - p_{nc} \neq 0\)
\(\alpha = 0.05\)
\[Z = \frac{(\hat P_1 - \hat P_2)-0}{\sqrt{\frac{\hat P (1-\hat P)}{n_1} + \frac{\hat P (1-\hat P)}{n_2}}} \sim N(0,1)\]
where \(\hat P = \frac{\mbox{total number of successes}}{\mbox{total number of cases}}\)
Assumptions: Independent observations (random sample) and number of pooled successes and pooled failures at least 10 for each group (\(n \hat p \geq 10\) and \(n (1 - \hat p) \geq 10\)). Pooled success rate: \(\hat p\) = (154+132)/(438+389) = 0.35, \(1-\hat p\) = 0.65. \(\hat p \times 438 = 151.47\), \((1-\hat p) \times 438 = 286.53\), \(\hat p \times 389 = 134.53\), \((1-\hat p) \times 389 = 254.47\)
<- 438
n1 <- 154/438
p1 <- 389
n2 <- 132/389
p2 <- (154+132)/(438+389)
phat <- (p1-p2)/sqrt(phat*(1-phat)/n1 + phat*(1-phat)/n2)
z z
## [1] 0.3701737
Probability that observing a test statistic equal to 0.37 or more extreme (in both directions) in the null distribution
2 * (1- pnorm(z))
## [1] 0.7112531
p-value > \(\alpha\), we fail to reject the null. The data do not provide strong evidence of a difference between the proportions of college graduates and non-college graduates who support off-shore drilling in California.
New York is known as “the city that never sleeps”. A random sample of 25 New Yorkers were asked how much sleep they get per night. Statistical summaries of these data are shown below. The point estimate suggests New Yorkers sleep less than 8 hours a night on average. Is the result statistically significant?
n | \(\bar x\) | s | min | max |
---|---|---|---|---|
25 | 7.73 | 0.77 | 6.17 | 9.78 |
Solution
\(H_0: \mu = 8\) (New Yorkers sleep 8 hrs per night on average)
\(H_1: \mu \neq 8\) (New Yorkers sleep less or more than 8 hrs per night on average)
(\(\mu_0\) is 8)
\(\alpha = 0.05\)
\[T=\frac{\bar X - \mu_0}{S/\sqrt{n}} \sim t(n-1)\]
Degrees of freedom \(n-1=25-1=24\)
Assumptions: Independent observations (random sample). Normality (normality or sample size \(\geq\) 30). In this case the min/max suggest there are no concerning outliers.
<- 25
n <- 7.73
barx <- 0.77
s <- (barx-8)/(s/sqrt(n))
t t
## [1] -1.753247
Probability that observing a test statistic equal to -1.75 or more extreme (in both directions) in the t-distribution.
<- n-1
df 2*pt(t, df = df)
## [1] 0.09232523
p-value > \(\alpha\), we fail to reject the null. The data do not provide strong evidence that New Yorkers sleep more or less than 8 hours per night on average.
Georgianna claims that in a small city renowned for its music school, the average child takes less than 5 years of piano lessons. We have a random sample of 20 children from the city, with a mean of 4.6 years of piano lessons and a standard deviation of 2.2 years.
Solution
\(H_0: \mu = 5\)
\(H_1: \mu \neq 5\)
(\(\mu_0\) is 5)
(Alternatively, we could have decided to test \(H_0: \mu \geq 5\) vs. \(H_1: \mu < 5\))
\(\alpha = 0.05\)
\[T=\frac{\bar X - \mu_0}{S/\sqrt{n}} \sim t(n-1)\]
Assumptions: Independent observations (random sample) and normality (normality or sample size \(\geq\) 30). We assume the distribution of years of piano lessons is approximately normal.
<- 20
n <- 4.6
barx <- 2.2
s <- (barx-5)/(s/sqrt(n))
t t
## [1] -0.8131156
Probability of observing a test statistic of -0.81 or more extreme (in both directions) in the t-distribution.
<- n-1
df 2*pt(t, df = df)
## [1] 0.4262241
p-value > \(\alpha\), we fail to reject the null. We do not have sufficiently strong evidence to reject the notion that the average is 5 years.
Assumptions:
\(\sigma\) is unknown, so we use \(s\). We a t distribution with \(n-1=20-1\) degrees of freedom.
\(\bar x \pm t^*_{19} \times SE = \bar x - t^*_{19} \times \frac{s}{\sqrt{n}} = 4.6 \pm 2.09 \times \frac{2.2}{\sqrt{20}} = (3.57, 5.62)\).
\(t^*_{19}\) is the value such \(\alpha/2=0.05/2\) of the probability in the t(19) distribution is below \(t^*_{19}\).
qt(0.025, 19)
## [1] -2.093024
We are 95% confident that the average number of years a child takes piano lessons in this city is 3.57 to 5.62 years.
A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies he is assuming that the standard deviation of savings is USD 100. He wants to collect data such that he can get a margin of error of no more than USD 10 at a 95% confidence level. How large of a sample should he collect?
Solution
When the population standard deviation is known a 95% confidence interval for the population mean is
\[\bar X \pm z^* \times SE = \bar X \pm z^* \times \frac{\sigma}{\sqrt{n}}\]
where \(z^*\) is the value such that \(\alpha/2=0.05/2\) of the probability in the standard normal distribution is below \(z^*\)
qnorm(0.025)
## [1] -1.959964
The margin of error is \(z^* \times SE = 1.96 \times \sigma/\sqrt{n} = 1.96 \times 100/\sqrt{n}\).
We want this value to be less than 10, which leads to \(1.96 \times 100/\sqrt{n} \leq 10\), \(1.96/10 \times 100 \leq \sqrt{n}\), \(19.6^2 \leq n\), \(n \geq 384.16\).
Thus, we need a sample size of at least 385 (round up for sample size calculations).
A group of researchers are interested in the possible effects of distracting stimuli during eating, such as an increase or decrease in the amount of food consumption. To test this hypothesis, they monitored food intake for a group of 44 patients who were randomized into two equal groups. The treatment group ate lunch while playing solitaire, and the control group ate lunch without any added distractions. Patients in the treatment group ate 52.1 grams of biscuits, with a standard deviation of 45.1 grams, and patients in the control group ate 27.1 grams of biscuits, with a standard deviation of 26.4 grams. Do these data provide convincing evidence that the average food intake (measured in amount of biscuits consumed) is different for the patients in the treatment group? Assume that conditions for inference are satisfied.
Solution
\(H_0: \mu_1 = \mu_2\)
\(H_1: \mu_1 \neq \mu_2\)
or
\(H_0: \mu_1- \mu_2 = 0\)
\(H_1: \mu_1 - \mu_2 \neq 0\)
\(\alpha = 0.05\)
\[T = \frac{(\bar X_1 - \bar X_2) - 0}{\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2} }} \sim t(min(n_1-1,n_2-1))\]
Assumptions: Independent observations and normality (distribution normal or samples sizes \(\geq\) 30)
<- 22
n1 <- 52.1
x1 <- 45.1
s1
<- 22
n2 <- 27.1
x2 <- 26.4
s2
<- (x1-x2)/sqrt(s1^2/n1+s2^2/n2)
t t
## [1] 2.243845
Probability of observing a value of the test statistic equal 2.24 to or more extreme (in both directions) assuming the null is true
<- min(n1-1, n2-1) # 21
df 2*(1-pt(t, df))
## [1] 0.03575082
p-value < \(\alpha\). We reject the null. The data provide strong evidence that the average food consumption by the patients in the treatment and control groups are different.