A random process is a random phenomenon or experiment that can have a range of outcomes. For example, tossing a coin, rolling a dice, or measuring the height of a randomly selected individual.

A

**random variable**is a variable whose possible values are numbers associated to the outcomes of a random process. Random variables are usually denoted with capital letters such as \(X\) or \(Y\). There are two types of random variables: discrete and continuous.A

**discrete**random variable is one which may take only a finite or countable infinite set of values. For example:

\(X =\) outcome obtained when tossing a coin (e.g., \(X=1\) if heads and \(X = 0\) if tails)

\(X =\) number of members in a family randomly selected in USA (e.g., 1, 2, 3)

\(X =\) number of patients admitted in a hospital in a given day

\(X =\) number of defective light bulbs in a box of 20

\(X =\) year that a student randomly selected in the University was born (e.g., 1998, 2000)

- A
**continuous**random variable is one which takes an infinite non-countable number of possible values. For example:

\(X\) = height of a student randomly selected in the department (e.g., 1.65 m, 1.789 m)

\(X\) = time required to run 1 Km (e.g., 4.56 min, 5.123 min)

\(X\) = amount of sugar in an orange (e.g., 9.21 g, 12.2 g)

A probability distribution of a random variable \(X\) is the specification of all the possible values of \(X\) and the probability associated with those values.

If \(X\) is **discrete**, the probability distribution is called **probability mass function**. We can describe the probability mass function of \(X\) by making a table with all the possible values of \(X\) and their associated probabilities. We can also represent graphically the probability mass function using a barplot.

Let \(X\) be a discrete random variable that may take \(n\) different values \(X = x_i\) with probability \(P(X = x_i) = p_i\), \(i =1,\ldots,n\). The probability mass function of \(X\) can be represented as follows:

Value of \(X\) | \(P(X = x_i)\) |
---|---|

\(x_1\) | \(p_1\) |

\(x_2\) | \(p_2\) |

… | |

\(x_n\) | \(p_n\) |

The probability mass function must satisfy the following:

- \(0 < p_i < 1\) for all \(i\)
- \(\sum_{i=1}^n p_i = p_1 + p_2 + \ldots + p_n = 1\)

**Example**

Let \(X\) be a discrete random variable with the following probability mass function.

- | ||||||
---|---|---|---|---|---|---|

Outcome | 1 | 2 | 3 | 4 | 5 | 6 |

Probability | 0.20 | 0.40 | 0.10 | 0.10 | 0.13 | 0.07 |

- Check the validity of the probability mass function.

Probabilities between 0 and 1 and sum to 1.

- Calculate the probability that \(X\) is equal to 2 or 3.

P(X = 2 or X = 3) = P(X = 2) + P(X = 3) = 0.4 + 0.1 = 0.5.

- Calculate the probability that \(X\) is greater than 1

P(X > 1) = 1 - P(X = 1) = 1 - 0.2 = 0.8

If \(X\) is **continuous**, the probability distribution is called **probability density function**. A continuous variable \(X\) takes an infinite number of possible values, and the probability of observing any single value is equal to 0.

Therefore, instead of discussing the probability of \(X\) at a specific value \(x\), we deal with \(f(x)\), the probability density function of \(X\) at \(x\).

We cannot work with probabilities of \(X\) at specific values, but we can assign probabilities to intervals of values. The probability of \(X\) being between \(a\) and \(b\) is given by the area under the probability density curve from \(a\) to \(b\).

\[P(a \leq X\leq b) = \int_a^b f(x)dx\]

The probability density function \(f(x)\) must satisfy the following:

- The probability density function has no negative values (\(f(x) > 0\) for all \(x\))
- Total area under the curve is equal to 1 (\(\int_{-\infty}^{-\infty} f(x)dx = 1\))

The cumulative distribution function (CDF) of a random variable \(X\) denotes the probability that \(X\) takes a value less than or equal to \(x\), for every value of \(x\).

If \(X\) is **discrete**, the cumulative distribution function at a value \(x\) is calculated as the sum of the probabilities of the values that are less than or equal to \(x\):

\[F(x)=P(X\leq x) = \sum_{a\leq x}P(X=a)\]

If \(X\) is **continuous**, the cumulative distribution function at value \(x\) is calculated as the area under the probability density function to the left of \(x\).

\[F(x)=P(X\leq x) = \int_{-\infty}^x f(a)da\]

The probability that a continuous variable takes values between \(a\) and \(b\) can be expressed as \(P(a \leq X \leq b) = F(b) - F(a)\)

**Example**

Calculate and represent graphically the cumulative distribution function for the discrete random variable given by the following probability mass function:

- | ||||||
---|---|---|---|---|---|---|

Outcome | 1 | 2 | 3 | 4 | 5 | 6 |

Probability | 0.20 | 0.40 | 0.10 | 0.10 | 0.13 | 0.07 |

\(F(1) = P(X \leq 1) = 0.20\)

\(F(2) = P(X \leq 2) = 0.20+0.40 = 0.60\)

\(F(3) = P(X \leq 3) = 0.20+0.40+0.10 = 0.70\)

\(F(4) = P(X \leq 4) = 0.20+0.40+0.10+0.10=0.80\)

\(F(5) = P(X \leq 5) = 0.20+0.40+0.10+0.10+0.13=0.93\)

\(F(6) = P(X \leq 6) = 0.20+0.40+0.10+0.10+0.13+0.07=1\)

The Bernoulli distribution is used to describe experiments having exactly two outcomes (e.g., the toss of a coin will be head or tail, a person will test positive for a disease or not, or a political party will win an election or not).

If \(X\) is a random variable that has a Bernoulli distribution with probability of success \(p\), we write \[X \sim Ber(p)\]

Trial can result in two possible outcomes, namely, success (\(X=1\)) and failure (\(X=0\))

Probability of success \(p\) is the same for each trial (\(0 < p < 1\))

The outcome of one trial has no influence on later outcomes (trials are independent)

Probability mass function: \[P(X = x) = p^x (1-p)^{1-x},\ x \in \{0, 1\}\]

We can check this: if \(x=0\), \(P(X=0) = p^0 (1-p)^{1-0}=1-p\), and if \(x=1\), \(P(X=1) = p^1 (1-p)^{0}=p\)Mean is \(E[X]= \sum_{i} x_i P(X=x_i) = 0 (1-p)+ 1 p=p\)

Variance is \(Var[X] = E[(X-E[X])^2] = E[X^2]-E[X]^2 = 0^2 (1-p)+ 1^2 p - p^2 = p(1-p)\)

**Example**

The binomial distribution is used to describe the number of successes in a fixed number of independent Bernoulli trials (e.g., number of heads when tossing a coin 20 times, number of people that test positive for a disease out of 100 people tested)

If \(X\) is a random variable that has a Binomial distribution with number of trials \(n\) and probability of success on a single trial \(p\), we write

\[X \sim Binomial(n,p)\]

\(x=0,1,2,\ldots,n\) number of successes

The number of trials is \(n\) fixed

The probability of success \(p\) is the same from one trial to another

Each trial is independent (none of the trials have an effect on the probability of the next trial)

Probability mass function. Probability of having \(x\) successful outcomes in an experiment of \(n\) independent trials and probability of success \(p\): \[P(X=x) = \binom{n}{x} p^x (1-p)^{n-x}\]

\(x\) successes occur with probability \(p^x\) and \(n -x\) failures occur with probability \((1 - p)^{n - x}\). The \(x\) successes can occur anywhere among the \(n\) trials. There are \(\binom{n}{x}\) (\(n\) choose \(x\)) number of ways to get \(x\) successes in a sequence of \(n\) trials. \(\displaystyle{\binom{n}{x} = \frac{n!}{x!(n-x)!}}\) where \(n\) factorial = \(n! = n \times (n-1) \times \ldots \times 1\).Mean is \(n p\). Variance is \(n p (1-p)\)

**Example**

A Binomial random variable is the sum of independent, identically distributed Bernoulli random variables.

Let \(X_1, X_2, \ldots, X_n\) be independent Bernoulli random variables, each with the same parameter \(p\). Then the sum \(X = X_1 + \ldots + X_n\) is a Binomial random variable with parameters \(n\) and \(p\).

**Example**

Suppose that a fair coin is tossed 3 times and the probability of head is \(p\). The probability of obtaining 2 heads in the first two tosses and 1 tail in the third toss is \[P(X_1 = 1, X_2 =1, X_3 = 0) = p \times p \times (1-p) = p^2 (1-p)\] (tosses are independent and probabilities are multiplied)

There are 3 possible ways we obtain 2 heads and 1 tail in three tosses:

\(P\left(\sum_{i=1}^3 X_i = 2\right) =\)

\(P(X_1 = 1, X_2 =1, X_3 = 0) + P(X_1 = 1, X_2 =0, X_3 = 1) + P(X_1 = 0, X_2 =1, X_3 = 1) =\)

\(3 p^2 (1-p)\)In general, the probability of obtaining \(x\) heads in \(n\) tosses is \[P\left(\sum_{i=1}^n X_i = x\right) = \binom{n}{x} p^x (1-p)^{n-x}\]

Purpose | Function | Example |
---|---|---|

Generate `n` random values from a Binomial distribution |
`rbinom(n, size, prob)` |
`rbinom(1000, 12, 0.25)` generates 1000 values from a Binomial distribution with number of trials 12 and probability of success is 0.25 |

Probability Mass Function | `dbinom(x, size, prob)` |
`dbinom(2, 12, 0.25)` probability of obtaining 2 successes when the number of trials 12 and the probability of success is 0.25 |

Cumulative Distribution Function (CDF) | `pbinom(q, size, prob)` |
`pbinom(2, 12, 0.25)` probability of observing 2 or fewer successes when the number of trials 12 and the probability of success is 0.25 |

Quantile Function (inverse of `pbinom()` ) |
`qbinom(p, size, prob)` |
`qbinom(0.98, 12, 0.25)` value at which the CDF of the Binomial distribution with 12 trials and probability of success 0.25 is equal to 0.98 |

**Example**

Let us consider a biased coin that comes up heads with probability 0.7 when tossed. Let \(X\) be the random variable denoting the number of heads obtained when the coin is tossed \(X \sim Bin(n = 10, p = 0.7)\).

- What is the probability of obtaining 4 heads in 10 tosses?

\(P(X = 4)\)

`dbinom(4, size = 10, prob = 0.7)`

`## [1] 0.03675691`

- Calculate \(P(X \leq 4)\)

\(P(X \leq 4) = P(X = 0)+P(X = 1)+P(X = 2)+P(X = 3)+P(X = 4)\)

```
dbinom(0, size = 10, prob = 0.7) +
dbinom(1, size = 10, prob = 0.7) +
dbinom(2, size = 10, prob = 0.7) +
dbinom(3, size = 10, prob = 0.7) +
dbinom(4, size = 10, prob = 0.7)
```

`## [1] 0.04734899`

Alternatively, using the cumulative distribution function

`pbinom(4, size = 10, prob = 0.7)`

`## [1] 0.04734899`

- Generate 3 random values from \(X \sim Bin(n = 10, p = 0.7)\)

`rbinom(3, size = 10, prob = 0.7)`

`## [1] 8 6 9`

The Normal distribution is used to model many procesess in nature and industry (e.g., heights, blood pressure, IQ scores, package delivery time, stock volatility)

If \(X\) is a random variable that has a normal distribution with mean \(\mu\) and variance \(\sigma^2\), we write \[X \sim N(\mu, \sigma^2)\]

\(x \in \mathbb{R}\)

Probability density function: \(\displaystyle{f(x)=\frac{1}{\sigma \sqrt{2\pi}} exp^{-(x-\mu)^2/2 \sigma^2}}\)

Bell-shaped density function

Single peak at the mean (most data values occur around the mean)

Symmetrical, centered at the mean

\(\mu \in \mathbb{R}\) represents the mean and the median of the normal distribution. The normal distribution is symmetric about the mean. Most values are around the mean, and half of the values are above the mean and half of the values below the mean. Changing the mean shifts the bell curve to the left or right.

\(\sigma>0\) is the standard deviation (\(\sigma^2>0\) variance). The standard deviation denotes how spread out the data are. Changing the standard deviation stretches or constricts the curve.

The standard normal distribution is a normal distribution with mean 0 and variance 1.

The standard normal distribution is represented with the letter \(Z\). \[Z \sim N(\mu=0, \sigma^2=1)\]

If \(X \sim N(\mu, \sigma^2)\), then \(\displaystyle{Z = \frac{X-\mu}{\sigma} \sim N(0, 1)}\)

The normal distribution is symmetric about the mean \(\mu\). The total area under the probability density curve is 1. The probability below the mean is 0.5 and the probability above the mean is 0.5.

The 68-95-99.7 rule is used to remember the percentage of values that lie within a band around the mean in a normal distribution with a width of two, four and six standard deviations, respectively.

Approximately 68% of the area lies between one standard deviation below the mean (\(\mu-\sigma\)) and one standard deviation above the mean (\(\mu+\sigma\)). That is, approximately 68% of the data lies within 1 standard deviation from the mean.

Approximately 95% of the data lies within 2 standard deviations from the mean.

Approximately 99.7% of the data lies within 3 standard deviations from the mean.

Purpose | Function | Example |
---|---|---|

Generate `n` random values from a Normal distribution |
`rnorm(n, mean, sd)` |
`rnorm(1000, 2, .25)` generates 1000 values from a Normal distribution with mean 2 and standard deviation 0.25 |

Probability Density Function | `dnorm(x, mean, sd)` |
`dnorm(0, 0, 0.5)` density at value 0 (height of the probability density function at value 0) of the Normal distribution with mean 0 and standard deviation 0.5. |

Cumulative Distribution Function (CDF) | `pnorm(q, mean, sd)` |
`pnorm(1.96, 0, 1)` area under the density function of the standard normal to the left of 1.96 (= 0.975) |

Quantile Function (inverse of `pnorm()` ) |
`qnorm(p, mean, sd)` |
`qnorm(0.975, 0, 1)` value at which the CDF of the standard normal distribution is equal to 0.975 ( = 1.96) |

**Example**

Consider a random variable \(X \sim N(\mu=100, \sigma=15)\). Calculate the following:

- \(P(X < 125)\)

`pnorm(125, mean = 100, sd = 15)`

`## [1] 0.9522096`

- \(P(X \geq 110) = 1 - P(X < 110)\)

`1 - pnorm(110, mean = 100, sd = 15)`

`## [1] 0.2524925`

- \(P(110 < X < 125) = P(X < 125) - P(X < 110)\)

```
pnorm(125, mean = 100, sd = 15) -
pnorm(110, mean = 100, sd = 15)
```

`## [1] 0.2047022`

The \(k\)th percentile is the value \(x\) such that \(P(X < x) = k/100\).

The \(k\)th percentile of a set of values divides them so that \(k\)% of the values lie below and (100-\(k\))% of the values lie above.

Quantiles are the same as percentiles, but are indexed by sample fractions rather than by sample percentages (e.g., 10th percentile or 0.10 quantile).

**Example**

The mean Body mass index (BMI) for men aged 60 is 29 with a standard deviation of 6. Remember BMI is a value derived from the weight and height of a person to work out if weight is healthy, and it is expressed as kg/m^2.

- What is the 90th percentile (or quantile 0.90)?

\(X \sim N(\mu=29, \sigma = 6)\). The 90th percentile is the value \(x\) such that \(P(X < x) = 0.90\). The 90th percentile is 36.69. This means 90% of the BMIs in men aged 60 are below 36.69. 10% of the BMIs in men aged 60 are above 36.69.

`qnorm(0.90, mean = 29, sd = 6)`

`## [1] 36.68931`

- Find percentile 25th (or quantile 0.25).

Value \(x\) such that \(P(X < x) = 0.25\).

`qnorm(0.25, mean = 29, sd = 6)`

`## [1] 24.95306`

- For infant girls, the mean body length at 10 months is 72 centimeters with a standard deviation of 3 centimeters. Suppose a girl of 10 months has a measured length of 67 centimeters. How does her length compare to other girls of 10 months?

\(X \sim N(\mu=72, \sigma = 3)\). We can compute her percentile by determining the proportion of girls with lengths below 67. Specifically, \(P(X < 67) = 0.047\). This girl is in the 4.7th percentile among her peers, her height is very small.

`pnorm(67, mean = 72, sd = 3)`

`## [1] 0.04779035`

The F distribution is usually defined as the ratio of variances of two populations normally distributed. The F distribution depends on two parameters: the degrees of freedom of the numerator and the degrees of freedom of the denominator.

Characteristics of the F distribution:

The F-distribution is always \(\geq 0\) since it is the ratio of variances and variances are squares of deviations and hence are non-negative numbers.

Density is skewed to the right

Shape changes depending on the numerator and denominator degrees of freedom

As the degrees of freedom for the numerator and denominator get larger, the density approximates the normal

In ANOVA we always use the right-tailed area to calculate p-values

Density curves of four different F-distributions:

R syntax

Purpose | Function | Example |
---|---|---|

Generate `n` random values from a F distribution |
`rf(n, df1, df2)` |
`rf(1000, 2, 11)` generates 1000 values from a F distribution with degrees of freedom 2 and 11 |

Probability Density Function (PDF) | `df(x, df1, df2)` |
`df(1, 2, 11)` density at value 1 (height of the PDF at value 1) of the F distribution with degrees of freedom 2 and 11 |

Cumulative Distribution Function (CDF) | `pf(q, df1, df2)` |
`pf(5, 2, 11)` area under the density function of the F(2,11) to the left of 5 |

Quantile Function (inverse of `pf()` ) |
`qf(p, df1, df2)` |
`qf(0.97, 2, 11)` value at which the CDF of the F(2,11) distribution is equal to 0.97 |

The chi-squared distribution with \(m \in N^*\) degrees of freedom is the distribution of a sum of squares of \(m\) independent standard normal random variables.

If \(X_1, X_2, \ldots, X_m\) are \(m\) independent random variables having the standard normal distribution, then \[V=X_1^2 + X_2^2+ \ldots + X_m^2 \sim \chi^2_{(m)}\] follows a chi-squared distribution with \(m\) degrees of freedom. Its mean is \(m\) and its variance is \(2m\).

```
par(mfrow = c(2, 2)) # to draw figures in a 2 by 2 array on the device
<- seq(0, 6, length.out = 100)
x plot(x, dchisq(x, 1), main = "chi2(1)", type = "l")
plot(x, dchisq(x, 2), main = "chi2(2)", type = "l")
plot(x, dchisq(x, 3), main = "chi2(3)", type = "l")
plot(x, dchisq(x, 6), main = "chi2(6)", type = "l")
```

Find the 95th percentile (or quantile 0.95) of the Chi-Squared distribution with 6 degrees of freedom.

`qchisq(0.95, df = 6)`

`## [1] 12.59159`