00102892: Statistical Learning
Course Description
This is an introductory statistical machine learning course for graduate and upper-level undergraduate students in statistics, applied
mathematics, computer science, and other fields which involve learning from data. The course covers fundamental principles of machine learning
and major topics in supervised, unsupervised, and semi-supervised learning, including linear regression and classification, spline and kernel
smoothing, model selection and regularization, additive models, tree-based methods, support vector machines, clustering, principal component
analysis, nonnegative matrix factorization, graphical models, etc.
Syllabus
Lectures and Assignments
Week | Date | Topics | References | Assignments | Notes and Further Reading |
1 | 9/21 | Introduction, no free lunch theorem | Zhou Chap. 1 | | |
| 9/24 | Classical linear regression | ESL Sec. 3.2 | | See Chap. 3 of Seber and Lee (2003) for a theoretical treatment of classical linear regression. |
2 | 9/28 | Best subset selection, ridge regression | ESL Secs. 3.3, 3.4.1 | Homework 1, due 10/12 | See Hastie (2020) for an updated review of ridge regularization. |
3 | 10/5 | No class | | | |
| 10/8 | No class | | | |
4 | 10/12 | Lasso and its variants | ESL Secs. 3.4, 3.8 | | |
5 | 10/19 | Algorithms for Lasso | ESL Sec. 3.8 | | See Boyd et al. (2011) for a review of ADMM. |
| 10/22 | Theory for Lasso | Lecture note, Wainwright Secs. 7.3, 7.4 | | See Wainwright Secs. 7.2 and 7.5 for the noiseless setting and variable selection consistency. |
6 | 10/26 | Linear classification | ESL Chap. 4 | Homework 2, due 11/2 | |
7 | 11/2 | Regression and smoothing splines | ESL Secs. 5.1–5.7 | | |
| 11/5 | Reproducing kernel Hilbert spaces | ESL Secs. 5.8 | | See Wainwright Chap. 12 for a more thorough discussion on RKHS. |
8 | 11/9 | Kernel smoothing | ESL Chap. 6, Tsybakov Sec. 1.2.1 | | See Chap. 24 of van der Vaart (1998) for an alternative treatment of kernel density estimation, including rate optimality. |
9 | 11/16 | Model selection | ESL Secs. 7.1–7.7 | Homework 3, due 11/23 | |
| 11/19 | Bootstrap, decision trees | ESL Secs. 7.10–7.12, 9.2 | | |
10 | 11/23 | Multivariate adaptive regression splines, boosting | ESL Secs. 9.3–9.5, 10.1–10.6 | Final project | |
11 | 11/30 | Gradient boosting, support vector machines | ESL Secs. 10.9–10.12, 12.1–12.3 | | |
| 12/3 | K-means, principal components | ESL Secs. 14.3, 14.5.1, 14.5.5 | | |
12 | 12/7 | Midterm | | Homework 4, due 12/17 | Mean = 53, median = 57, Q1 = 37, Q3 = 69, high score = 91 |
13 | 12/14 | Spectral clustering, nonnegative matrix factorization | ESL Secs. 14.5.3, 14.6 | | |
| 12/17 | Ensemble learning, random forests | ESL Chaps. 15, 16 | | |
14 | 12/21 | Gaussian graphical models | Lecture note, Wainwright Chap. 11 | | |
15 | 12/28 | Directed acyclic graphs | | Homework 5, due 1/4 | See Kalisch and Bühlmann (2007) for basic concepts and the PC algorithm. |
| 12/31 | Oral presentations | | | |
16 | 1/4 | Oral presentations | | |
|
|