# 1 Confidence intervals and hypothesis testing

## 1.1 Chronic illness (CI proportion)

In 2013, a Research Foundation reported that “45% of U.S. adults report that they live with one or more chronic conditions”. However, this value was based on a sample, so it may not be a perfect estimate for the population parameter of interest on its own. The study reported a standard error of about 1.2%, and a normal model may reasonably be used in this setting. Create a 95% confidence interval for the proportion of U.S. adults who live with one or more chronic conditions. Also interpret the confidence interval in the context of the study.

Solution

$$\hat p \pm z^* \times SE = \hat p \pm z^* \times \sqrt{\hat p(1-\hat p)/n} = 0.45 \pm 1.96 \times 0.012 = (0.42648, 0.47352)$$

Assumptions: Observations are independent and normal

$$\hat p = 0.45$$

$$z^* = 1.96$$, the number such that $$\alpha/2=0.05/2$$ of the probability in the $$N(0,1)$$ is below $$z^*$$, (qnorm(0.025))

c(0.45-1.96*0.012, 0.45+1.96*0.012)

We are 95% confident that the proportion of U.S. adults who live with one or more chronic conditions is between 42.64% and 47.35%.

## 1.2 Website registration (CI proportion)

A website is trying to increase registration for first-time visitors, exposing 1% of these visitors to a new site design. Of 752 randomly sampled visitors over a month who saw the new design, 64 registered.

1. Check any conditions required for constructing a confidence interval.
2. Compute the standard error.
3. Construct and interpret a 90% confidence interval for the fraction of first-time visitors of the site who would register under the new design (assuming stable behaviors by new visitors over time).

Solution

$$\hat p \pm z^* \times SE = \hat p \pm z^* \times \sqrt{\hat p(1-\hat p)/n} = 0.085 \pm 1.64 \times \sqrt{0.085(1-0.085)/752} = (0.0686, 0.1014)$$

$$\hat p = 64/752 = 0.085$$

Assumptions:

• Observations are independent
• At least 10 successes ($$64>10$$) and at least 10 failures ($$(752-64)=688>10$$)

SE: sqrt(0.085*(1-0.085)/752) = 0.01

$$z^* = -1.64$$, the number such that $$\alpha/2=0.10/2$$ of the probability in the $$N(0,1)$$ is below $$z^*$$, (qnorm(0.05) = -1.64)

c(0.085-1.64*0.01, 0.085+1.64*0.01)

We are 90% confident that the fraction of first-time visitors of the site who would register under the new design is between 6.8% and 10.1%.

## 1.3 Minimum wage (HT proportion)

Do a majority of US adults believe raising the minimum wage will help the economy, or is there a majority who do not believe this? A survey of 1,000 US adults found that 42% believe it will help the economy. Conduct an appropriate hypothesis test to help answer the research question.

Solution

1. Null and alternative hypotheses

$$H_0: p = 0.50$$
$$H_1: p \neq 0.50$$

($$p_0=0.50$$)

1. Significance level

$$\alpha = 0.05$$

1. Test statistic

$Z = \frac{\hat P - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \sim N(0,1)$

Assumptions: Independent observations (random sample) and number of expected successes ($$n \times 0.42 = 420$$) and failures ($$n \times (1-0.42) = 580$$) are both greater than 10

n <- 1000
p_hat <- 0.42
p0 <- 0.5
z <- (p_hat-p0)/sqrt((p0*(1-p0))/n)
z
##  -5.059644
1. p-value

Probability of observing a test statistic value of -5.06 or more extreme (in both directions) in the null distribution.

2*pnorm(z)
##  4.200394e-07
1. Decision

p-value < 0.05, we reject the null. We conclude that the fraction of US adults who believe raising the minimum wage will help the economy is not 50%. Because the observed value is less than 50% and we have rejected the null hypothesis, we can conclude that this belief is held by fewer than 50% of US adults.

## 1.4 Offshore drilling (HT difference proportions)

Results of a poll evaluating support for drilling for oil and natural gas off the coast of California are below.

Support 154 132
Oppose 180 126
Do not know 104 131
Total 438 389
1. What percent of college graduates and what percent of the non-college graduates in this sample support drilling for oil and natural gas off the Coast of California?
2. Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who support off-shore drilling in California is different than that of non-college graduates.

Solution

Percent of college graduates that support drilling: $$\hat p = 154/438=0.35$$

Percent of non-college graduates that support drilling: $$\hat p = 132/389=0.34$$

1. Null and alternative hypotheses

$$H_0: p_c = p_{nc}$$
$$H_1: p_c \neq p_{nc}$$

or

$$H_0: p_c - p_{nc} = 0$$
$$H_1: p_c - p_{nc} \neq 0$$

1. Set $$\alpha$$

$$\alpha = 0.05$$

1. Test statistic

$Z = \frac{(\hat P_1 - \hat P_2)-0}{\sqrt{\frac{\hat P (1-\hat P)}{n_1} + \frac{\hat P (1-\hat P)}{n_2}}} \sim N(0,1)$

where $$\hat P = \frac{\mbox{total number of successes}}{\mbox{total number of cases}}$$

Assumptions: Independent observations (random sample) and number of pooled successes and pooled failures at least 10 for each group ($$n \hat p \geq 10$$ and $$n (1 - \hat p) \geq 10$$). Pooled success rate: $$\hat p$$ = (154+132)/(438+389) = 0.35, $$1-\hat p$$ = 0.65. $$\hat p \times 438 = 151.47$$, $$(1-\hat p) \times 438 = 286.53$$, $$\hat p \times 389 = 134.53$$, $$(1-\hat p) \times 389 = 254.47$$

n1 <- 438
p1 <- 154/438
n2 <- 389
p2 <- 132/389
phat <- (154+132)/(438+389)
z <- (p1-p2)/sqrt(phat*(1-phat)/n1 + phat*(1-phat)/n2)
z
##  0.3701737
1. p-value

Probability that observing a test statistic equal to 0.37 or more extreme (in both directions) in the null distribution

2 * (1- pnorm(z))
##  0.7112531
1. Decision

p-value > $$\alpha$$, we fail to reject the null. The data do not provide strong evidence of a difference between the proportions of college graduates and non-college graduates who support off-shore drilling in California.

## 1.5 Sleep habits of New Yorkers (HI means)

New York is known as “the city that never sleeps”. A random sample of 25 New Yorkers were asked how much sleep they get per night. Statistical summaries of these data are shown below. The point estimate suggests New Yorkers sleep less than 8 hours a night on average. Is the result statistically significant?

n $$\bar x$$ s min max
25 7.73 0.77 6.17 9.78
1. Write the hypotheses in symbols and in words.
2. Check conditions, then calculate the test statistic, T, and the associated degrees of freedom.
3. Find and interpret the p-value in this context.
4. What is the conclusion of the hypothesis test?

Solution

1. Null and alternative hypotheses

$$H_0: \mu = 8$$ (New Yorkers sleep 8 hrs per night on average)
$$H_1: \mu \neq 8$$ (New Yorkers sleep less or more than 8 hrs per night on average)

($$\mu_0$$ is 8)

1. Choose $$\alpha$$

$$\alpha = 0.05$$

1. Test statistic

$T=\frac{\bar X - \mu_0}{S/\sqrt{n}} \sim t(n-1)$

Degrees of freedom $$n-1=25-1=24$$

Assumptions: Independent observations (random sample). Normality (normality or sample size $$\geq$$ 30). In this case the min/max suggest there are no concerning outliers.

n <- 25
barx <- 7.73
s <- 0.77
t <- (barx-8)/(s/sqrt(n))
t
##  -1.753247
1. p-value

Probability that observing a test statistic equal to -1.75 or more extreme (in both directions) in the t-distribution.

df <- n-1
2*pt(t, df = df)
##  0.09232523
1. Decision

p-value > $$\alpha$$, we fail to reject the null. The data do not provide strong evidence that New Yorkers sleep more or less than 8 hours per night on average.

## 1.6 Play the piano (CI and HI means)

Georgianna claims that in a small city renowned for its music school, the average child takes less than 5 years of piano lessons. We have a random sample of 20 children from the city, with a mean of 4.6 years of piano lessons and a standard deviation of 2.2 years.

1. Evaluate Georgianna’s claim (or that the opposite might be true) using a hypothesis test.
2. Construct a 95% confidence interval for the number of years students in this city take piano lessons, and interpret it in context of the data.
3. Do your results from the hypothesis test and the confidence interval agree? Explain your reasoning.

Solution

1. Null and alternative hypotheses

$$H_0: \mu = 5$$
$$H_1: \mu \neq 5$$

($$\mu_0$$ is 5)

(Alternatively, we could have decided to test $$H_0: \mu \geq 5$$ vs. $$H_1: \mu < 5$$)

1. Choose $$\alpha$$

$$\alpha = 0.05$$

1. Test statistic

$T=\frac{\bar X - \mu_0}{S/\sqrt{n}} \sim t(n-1)$

Assumptions: Independent observations (random sample) and normality (normality or sample size $$\geq$$ 30). We assume the distribution of years of piano lessons is approximately normal.

n <- 20
barx <- 4.6
s <- 2.2
t <- (barx-5)/(s/sqrt(n))
t
##  -0.8131156
1. p-value

Probability of observing a test statistic of -0.81 or more extreme (in both directions) in the t-distribution.

df <- n-1
2*pt(t, df = df)
##  0.4262241
1. Decision

p-value > $$\alpha$$, we fail to reject the null. We do not have sufficiently strong evidence to reject the notion that the average is 5 years.

Assumptions:

• We assume data are independent
• We assume data are normal

$$\sigma$$ is unknown, so we use $$s$$. We a t distribution with $$n-1=20-1$$ degrees of freedom.

$$\bar x \pm t^*_{19} \times SE = \bar x - t^*_{19} \times \frac{s}{\sqrt{n}} = 4.6 \pm 2.09 \times \frac{2.2}{\sqrt{20}} = (3.57, 5.62)$$.

$$t^*_{19}$$ is the value such $$\alpha/2=0.05/2$$ of the probability in the t(19) distribution is below $$t^*_{19}$$.

qt(0.025, 19)
##  -2.093024

We are 95% confident that the average number of years a child takes piano lessons in this city is 3.57 to 5.62 years.

1. Results from the hypothesis test and the confidence interval agree, since we did not reject the null hypothesis and the null value of 5 was in the confidence interval.

## 1.7 Car insurance savings (CI means, going backwards)

A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies he is assuming that the standard deviation of savings is USD 100. He wants to collect data such that he can get a margin of error of no more than USD 10 at a 95% confidence level. How large of a sample should he collect?

Solution

When the population standard deviation is known a 95% confidence interval for the population mean is

$\bar X \pm z^* \times SE = \bar X \pm z^* \times \frac{\sigma}{\sqrt{n}}$

where $$z^*$$ is the value such that $$\alpha/2=0.05/2$$ of the probability in the standard normal distribution is below $$z^*$$

qnorm(0.025)
##  -1.959964

The margin of error is $$z^* \times SE = 1.96 \times \sigma/\sqrt{n} = 1.96 \times 100/\sqrt{n}$$.

We want this value to be less than 10, which leads to $$1.96 \times 100/\sqrt{n} \leq 10$$, $$1.96/10 \times 100 \leq \sqrt{n}$$, $$19.6^2 \leq n$$, $$n \geq 384.16$$.

Thus, we need a sample size of at least 385 (round up for sample size calculations).

## 1.8 Gaming and distracted eating (HT means difference)

A group of researchers are interested in the possible effects of distracting stimuli during eating, such as an increase or decrease in the amount of food consumption. To test this hypothesis, they monitored food intake for a group of 44 patients who were randomized into two equal groups. The treatment group ate lunch while playing solitaire, and the control group ate lunch without any added distractions. Patients in the treatment group ate 52.1 grams of biscuits, with a standard deviation of 45.1 grams, and patients in the control group ate 27.1 grams of biscuits, with a standard deviation of 26.4 grams. Do these data provide convincing evidence that the average food intake (measured in amount of biscuits consumed) is different for the patients in the treatment group? Assume that conditions for inference are satisfied.

Solution

1. Null and alternative hypotheses

$$H_0: \mu_1 = \mu_2$$
$$H_1: \mu_1 \neq \mu_2$$

or

$$H_0: \mu_1- \mu_2 = 0$$
$$H_1: \mu_1 - \mu_2 \neq 0$$

1. Set $$\alpha$$

$$\alpha = 0.05$$

1. Test statistic

$T = \frac{(\bar X_1 - \bar X_2) - 0}{\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2} }} \sim t(min(n_1-1,n_2-1))$

Assumptions: Independent observations and normality (distribution normal or samples sizes $$\geq$$ 30)

n1 <- 22
x1 <- 52.1
s1 <- 45.1

n2 <- 22
x2 <- 27.1
s2 <- 26.4

t <- (x1-x2)/sqrt(s1^2/n1+s2^2/n2)
t
##  2.243845
1. p-value

Probability of observing a value of the test statistic equal 2.24 to or more extreme (in both directions) assuming the null is true

df <- min(n1-1, n2-1) # 21
2*(1-pt(t, df))
##  0.03575082
1. Decision

p-value < $$\alpha$$. We reject the null. The data provide strong evidence that the average food consumption by the patients in the treatment and control groups are different.