Applied Statistics and Data Analysis

Class meetings

July 28-31, 2022

9:00 am - 4:00 pm

Lecturer

Dr. Paula Moraga

E-mail:

Website: https://www.paulamoraga.com/

Teaching Assistant

Mr. André Victor Ribeiro Amaral

E-mail:

Website: https://www.avramaral.com/

Course materials

https://www.paulamoraga.com/course-aramco

Datasets

Practicals

https://avramaral.github.io/aramco_course

Method of evaluation

35% - Homework 1
35% - Homework 2
30% - Exam

Homework 1: Set on 29th July and due on 6th August
Homework 2: Set on 31st July and due on 6th August

Exam: Monday 8th August, 5:30 pm - 7:00 pm

Course overview

Objective: This course is an introduction to the formulation and use of linear models (and generalizations) including parameter estimation and inference for such models in a variety of settings. Emphasis will be split between understanding the theoretical foundations of the models and the ability to apply the models to answer scientific questions.

The course covers a range of statistical methods to enable to understand and implement a range of techniques to analyze data using the statistical software R. The objectives are:

  • Learning how to construct point and interval estimates of population parameters, and testing hypotheses using samples from the population.
    Example: Using the information of a survey about chronic conditions, provide a point estimate and a 95% confidence interval for the proportion of UK adults who live with one or more chronic conditions.

  • Learning how to conduct simple and multiple regression analyses to quantify the effects of explanatory variables on a response variable and make predictions.
    Example: How does physical activity relate to body mass index (kg/m^2)?

Course schedule

July 28: Statistical inference foundations

9:00 am - 10:00 am Probability distributions
10:00 am - 10:45 am Descriptive analysis
10:45 am - 11:00 am Coffee break
11:00 am - 12:00 pm Central limit theorem
12:00 pm - 1:00 pm Lunch break
1:00 pm - 2:30 pm Confidence intervals
2:30 pm - 2:45 pm Coffee break
2:45 pm - 4:00 pm Hypothesis testing


July 29: Multiple linear regression

9:00 am - 10:00 am Simple linear regression
10:00 am - 10:45 am Multiple linear regression
10:45 am - 11:00 am Coffee break
11:00 am - 12:00 pm Model assumptions, unusual observations
12:00 pm - 1:00 pm Lunch break
1:00 pm - 2:30 pm Model selection
2:30 pm - 2:45 pm Coffee break
2:45 pm - 4:00 pm Practical


July 30: Generalized linear models (GLMs)

9:00 am - 10:00 am Logistic regression
10:00 am - 10:45 am Poisson regression
10:45 am - 11:00 am Coffee break
11:00 am - 12:00 pm Sensitivity and specificity. ROC curve
12:00 pm - 1:00 pm Lunch break
1:00 pm - 2:30 pm Practical
2:30 pm - 2:45 pm Coffee break
2:45 pm - 4:00 pm Practical


July 31: Categorical variables and interactions. Spatial analysis

9:00 am - 10:00 am Categorical variables
10:00 am - 10:45 am Interactions
10:45 am - 11:00 am Coffee break
11:00 am - 12:00 pm Practical
12:00 pm - 1:00 pm Lunch break
1:00 pm - 2:30 pm Spatial analysis
2:30 pm - 2:45 pm Coffee break
2:45 pm - 3:30 pm Spatial analysis
3:30 pm - 4:00 pm Wrap-up

References

  1. Faraway (2005) Linear Models with R, Chapman & Hall/CRC
  2. Faraway (2006) Extending the Linear Model with R, Chapman & Hall/CRC

License

You may not copy or distribute the course materials.