# Applied Statistics and Data Analysis

Class meetings

July 28-31, 2022

9:00 am - 4:00 pm

Lecturer

Dr. Paula Moraga

E-mail:

Website: https://www.paulamoraga.com/

Teaching Assistant

Mr. André Victor Ribeiro Amaral

Website: https://www.avramaral.com/

Course materials

https://www.paulamoraga.com/course-aramco

Datasets

# Practicals

https://avramaral.github.io/aramco_course

## Method of evaluation

35% - Homework 1
35% - Homework 2
30% - Exam

Homework 1: Set on 29th July and due on 6th August
Homework 2: Set on 31st July and due on 6th August

Exam: Monday 8th August, 5:30 pm - 7:00 pm

## Course overview

Objective: This course is an introduction to the formulation and use of linear models (and generalizations) including parameter estimation and inference for such models in a variety of settings. Emphasis will be split between understanding the theoretical foundations of the models and the ability to apply the models to answer scientific questions.

The course covers a range of statistical methods to enable to understand and implement a range of techniques to analyze data using the statistical software R. The objectives are:

• Learning how to construct point and interval estimates of population parameters, and testing hypotheses using samples from the population.
Example: Using the information of a survey about chronic conditions, provide a point estimate and a 95% confidence interval for the proportion of UK adults who live with one or more chronic conditions.

• Learning how to conduct simple and multiple regression analyses to quantify the effects of explanatory variables on a response variable and make predictions.
Example: How does physical activity relate to body mass index (kg/m^2)?

## Course schedule

July 28: Statistical inference foundations

 9:00 am - 10:00 am Probability distributions 10:00 am - 10:45 am Descriptive analysis 10:45 am - 11:00 am Coffee break 11:00 am - 12:00 pm Central limit theorem 12:00 pm - 1:00 pm Lunch break 1:00 pm - 2:30 pm Confidence intervals 2:30 pm - 2:45 pm Coffee break 2:45 pm - 4:00 pm Hypothesis testing

July 29: Multiple linear regression

 9:00 am - 10:00 am Simple linear regression 10:00 am - 10:45 am Multiple linear regression 10:45 am - 11:00 am Coffee break 11:00 am - 12:00 pm Model assumptions, unusual observations 12:00 pm - 1:00 pm Lunch break 1:00 pm - 2:30 pm Model selection 2:30 pm - 2:45 pm Coffee break 2:45 pm - 4:00 pm Practical

July 30: Generalized linear models (GLMs)

 9:00 am - 10:00 am Logistic regression 10:00 am - 10:45 am Poisson regression 10:45 am - 11:00 am Coffee break 11:00 am - 12:00 pm Sensitivity and specificity. ROC curve 12:00 pm - 1:00 pm Lunch break 1:00 pm - 2:30 pm Practical 2:30 pm - 2:45 pm Coffee break 2:45 pm - 4:00 pm Practical

July 31: Categorical variables and interactions. Spatial analysis

 9:00 am - 10:00 am Categorical variables 10:00 am - 10:45 am Interactions 10:45 am - 11:00 am Coffee break 11:00 am - 12:00 pm Practical 12:00 pm - 1:00 pm Lunch break 1:00 pm - 2:30 pm Spatial analysis 2:30 pm - 2:45 pm Coffee break 2:45 pm - 3:30 pm Spatial analysis 3:30 pm - 4:00 pm Wrap-up

## References

1. Faraway (2005) Linear Models with R, Chapman & Hall/CRC
2. Faraway (2006) Extending the Linear Model with R, Chapman & Hall/CRC