Wei Lin @ PKU

00137960: Statistical Thinking

Course Description

This course provides a compact and accessible introduction to statistics, focusing on the most important ideas that have shaped the field and have influenced our ways of viewing and understanding the world. Essential concepts including data, models, algorithms, sampling, likelihood, information, hypothesis testing, regression, and causality will be motivated and introduced. A comparative overview of frequentist and Bayesian inference will be presented. The discussion will be illustrated by examples from the physical, biological, and social sciences.

Syllabus

Lectures and Assignments

Week Date Topics References Assignments Notes
1 2/17 Introduction Poldrack Chap. 1
2 2/24 Data, aggregation and visualization Poldrack Chaps. 2–4 The website ColorBrewer 2.0 provides guidance in choosing good colors for your plots.
2/26 Benford's law Hill (1995), Leemis et al. (2000), Tsagbey et al. (2017) A more thorough survey of Benford's law is Berger & Hill (2011).
3 3/3 Models, formal theory Poldrack Chap. 5, McCullagh (2002)
4 3/10 Bias–variance trade-off, statistical modeling ESL Secs. 7.2, 7.3, Breiman (2001) Homework 1 due 3/31 The AIC–BIC dilemma (Yang, 2005) exemplifies the conflict between prediction and inference. Reflections and updates on Breiman's two cultures in the big data era were given by Donoho (2017) and Efron (2020).
3/12 Frequentist inference Efron & Hastie Chap. 2
5 3/17 Bayesian inference Efron & Hastie Chap. 3
6 3/24 Likelihood and MLE Efron & Hastie Secs. 4.1, 4.2 The history of MLE was reviewed in Aldrich (1997) and Stigler (2007).
3/26 Fisherian inference, parametric models Efron & Hastie Secs. 4.3–5.2 The statistical triangle was suggested by Efron (1998).
7 3/31 Exponential families Efron & Hastie Secs. 5.3–5.5 Homework 2 due 4/14
8 4/7 Information and entropy Cover & Thomas Chap. 1, Secs. 2.1–2.7, 8.1, 17.7 Lad et al. (2015) introduced the notion of extropy as a complementary dual to entropy.
4/9 Linear regression Poldrack Chap. 14, Seber & Lee Secs. 3.1–3.5 See Gorroochurn (2016) for more history on how Galton coined the name ‘‘regression,’’ and Aldrich (2005) on Fisher's contributions to fixed-X regression.
9 4/14 Generalized linear models Efron & Hastie Chap. 8
10 4/21 Hypothesis testing Poldrack Chap. 9, Casella & Berger Sec. 8.3.4 See the ASA's statement on p-values and a reflection on its impact.
4/23 Likelihood ratio tests, meta-analysis Casella & Berger Sec. 10.3.1, Heard & Rubin-Delanchy (2018)
11 4/28 Multiple testing Efron & Hastie Secs. 15.1–15.3, 15.5 Homework 3 due 5/21 For a retrospective look at the original FDR paper, see Benjamini (2010).
12 5/5 No class
5/7 No class
13 5/12 Survival analysis Efron & Hastie Secs. 9.1–9.3
14 5/19 Cox regression, resampling methods Efron & Hastie Secs. 9.4, 10.1, 10.2 Asymptotic theory for the Cox model was established by Andersen & Gill (1982) via counting process and martingale techniques. For alternative justifications using profile likelihood or nonparametric MLE, see Murphy & van der Vaart (2000) and Zeng & Lin (2007).
5/21 Bootstrap, cross-validation Efron & Hastie Secs. 10.3, 10.4, 11.1, 11.2, 12.1, 12.2 Shao & Tu (1995) is a neat introduction to the theory of the jackknife and bootstrap.
15 5/26 Stein's phenomenon and shrinkage Efron & Hastie Secs. 7.1, 7.2, 7.4 Final report due; example topics The philosophical significance of Stein’s paradox was explored by Vassend et al. (2017).
16 6/2 No class
6/4 Ridge regression, causal inference Efron & Hastie Sec. 7.3, Wasserman's lecture notes Two authoritative reviews of causal inference are Rubin (2005) and Pearl (2009).