R Syntax

# Choose a file interactively

# Read CSV file
d <- read.csv(pathfile)

# Read text file.
# Set header = TRUE if the first row of the data
# corresponds to the names of the variables
d <- read.table(pathfile, header = TRUE)

# Vector with values 3, 6, 7
c(3, 6, 7)

# Value of data d in row 3 and column 7
d[3, 7]

# Row 3 of data d
d[3, ]

# Column 7 of data d
d[, 7]

# Rows 3 and 5 of data d
d[c(3, 5), ]

# Columns 7 and 9 of data d
d[ , c(7, 9)]

# Data d without rows 3 and 5
d[-c(3, 5), ]

# Data d without columns 7 and 9
d[ , -c(7, 9)]


If you do not have R installed, you can google “online R compiler”


1 Probability distributions and the Central Limit Theorem

1.1 Area under the curve

What percent of a standard normal distribution \(N(\mu = 0, \sigma = 1)\) is found in each region? Be sure to draw a graph.

  1. \(Z < -1.35\)
  2. \(Z > 1.48\)
  3. \(-0.4 < Z < 1.5\)
  4. \(|Z| > 2\)

1.2 Overweight baggage

Suppose weights of the checked baggage of airline passengers follow a nearly normal distribution with mean 45 pounds and standard deviation 3.2 pounds. Most airlines charge a fee for baggage that weigh in excess of 50 pounds. Determine what percent of airline passengers incur this fee.

1.3 LA weather

The average daily high temperature in June in LA is 77\(^o\)F with a standard deviation of 5\(^o\)F. Suppose that the temperatures in June closely follow a normal distribution.

  1. What is the probability of observing an 83\(^o\)F temperature or higher in LA during a randomly chosen day in June?
  2. How cool are the coldest 10% of the days (days with lowest average high temperature) during June in LA?

1.4 GRE scores

The mean score for Verbal Reasoning section for all the Graduate Record Examination (GRE) takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal.

  1. Write down the short-hand for these two normal distributions.
  2. The score of a student who scored in the 80th percentile on the Quantitative Reasoning section.
  3. The score of a student who scored worse than 70% of the test takers in the Verbal Reasoning section.

1.5 Hen eggs (CLT)

The distribution of the number of eggs laid by a certain species of hen during their breeding period has a mean of 35 eggs with a standard deviation of 18.2. Suppose a group of researchers randomly samples 45 hens of this species, counts the number of eggs laid during their breeding period, and records the sample mean. They repeat this 1,000 times, and build a distribution of sample means.

  1. What is this distribution called?
  2. Would you expect the shape of this distribution to be symmetric, right skewed, or left skewed? Explain your reasoning.
  3. Calculate the variability of this distribution and state the appropriate term used to refer to this value.
  4. Suppose the researchers’ budget is reduced and they are only able to collect random samples of 30 hens. The sample mean of the number of eggs is recorded, and we repeat this 1,000 times, and build a new distribution of sample means. How will the variability of this new distribution compare to the variability of the original distribution?

1.6 Identify the parameter (Inference)

For each of the following situations, state whether the parameter of interest is a mean or a proportion. It may be helpful to examine whether individual responses are numerical or categorical.

  1. In a survey, one hundred college students are asked how many hours per week they spend on the Internet.
  2. In a survey, one hundred college students are asked: “What percentage of the time you spend on the Internet is part of your course work?”"
  3. In a survey, one hundred college students are asked whether or not they cited information from Wikipedia in their papers.
  4. In a sample of one hundred recent college graduates, it is found that 85 percent expect to get a job within one year of their graduation date.

1.7 Quality control (Inference)

As part of a quality control process for computer chips, an engineer at a factory randomly samples 212 chips during a week of production to test the current rate of chips with severe defects. She finds that 27 of the chips are defective.

  1. What population is under consideration in the data set?
  2. What parameter is being estimated?
  3. What is the point estimate for the parameter?