Wei Lin @ PKU

00102892: Statistical Learning

Course Description

This is an introductory statistical machine learning course for graduate and upper-level undergraduate students in statistics, applied mathematics, computer science, and other fields which involve learning from data. The course covers fundamental principles of machine learning and major topics in supervised, unsupervised, and semi-supervised learning, including linear regression and classification, spline and kernel smoothing, model selection and regularization, additive models, tree-based methods, support vector machines, clustering, principal component analysis, nonnegative matrix factorization, graphical models, etc.

Syllabus

Lectures and Assignments

Week	Date	Topics	References	Assignments	Notes and Further Reading
1	9/9	Introduction, no free lunch theorem	Zhou Chap. 1
	9/11	Classical linear regression	ESL Secs. 2.2, 3.2		Seemingly unrelated regressions are an example where estimating the error covariance can improve efficiency; see Zellner (1962).
2	9/16	Sparse linear regression	ESL Secs. 3.3, 3.4	Homework 1, due 9/23	Tibshirani (2011) provides a retrospective view of the Lasso and its variants; Fan and Lv (2010) supplements the view by emphasizing nonconvex penalties and feature screening methods.
3	9/23	Theory for Lasso	Wainwright Sec. 7.5		See Wainwright Secs. 7.3 and 7.4 for theory via the restricted eigenvalue condition, and van de Geer and Bühlmann (2009) for comparisons of conditions.
	9/25	Algorithms for Lasso, linear discriminant analysis	ESL Sec. 3.8, 4.1–4.3		See Boyd et al. (2011) for ADMM, Yuan et al. (2007) for low-rank regression, and Mai et al. (2012) for sparse LDA.
4	9/30	National Day
5	10/7	Logistic regression, separating hyperplanes	ESL Secs. 4.4, 4.5	Homework 2, due 10/14
	10/9	Splines	ESL Secs. 5.1–5.6		For P-splines that combine the ideas of regression splines and smoothing splines, see Eilers and Marx (1996).
6	10/14	Reproducing kernel Hilbert spaces, kernel smoothing	ESL Secs. 5.7, 5.8, 6.1		For a more formal introduction to reproducing kernel Hilbert spaces, see Wainwright Chap. 12.
7	10/21	Local polynomial regression, kernel density estimation	ESL Secs. 6.1–6.6, Tsybakov Sec. 1.2.1	Homework 3, due 10/28	For the construction of higher order kernels using Legendre polynomials, see Tsybakov Sec. 1.2.2.
	10/23	Naive Bayes, principles of model selection, AIC	ESL Secs. 6.6, 7.1–7.6
8	10/28	BIC, bootstrap, generalized additive models	ESL Secs. 7.7–7.12, 9.1		The AIC–BIC dilemma was discussed by Yang (2005) and van Erven et al. (2012).
9	11/4	Classification and regression trees, multivariate adaptive regression splines	ESL Secs. 9.2–9.5		A review of diversity indices was given by Morris et al. (2014).
	11/6	Midterm exam			Mean = 49, median = 51, Q1 = 33, Q3 = 66, high score = 94
10	11/11	Boosting	ESL Secs. 10.1–10.6, 10.9–10.12	Homework 4, due 11/18	Schapire and Freund (2012) is a book-length treatment of boosting.
11	11/18	Support vector machines	ESL Secs. 12.1–12.3		Multiclass SVMs were considered by Lee et al. (2004).
	11/20	K-means clustering, principal component analysis	ESL Secs. 14.3, 14.5		Consistency of K-means clustering was established by Pollard (1981).
12	11/25	Spectral clustering, nonnegative matrix factorization	ESL Secs. 14.5.3, 14.6		For consistency of spectral clustering and its application to community detection in social network models, see von Luxburg et al. (2008) and Rohe et al. (2011).
13	12/2	Gaussian graphical models	ESL Secs. 17.1–17.3, Wainwright Chap. 11	Homework 5, due 12/16	The CLIME method was proposed by Cai et al. (2011).
	12/4	Directed acyclic graphs	See your lecture notes		Estimating high-dimensional, sparse DAGs was considered by Kalisch and Bühlmann (2007) and van de Geer and Bühlmann (2013).
14	12/9	Ensemble learning, semi-supervised learning	ESL Chaps. 15, 16, Zhou Chap. 13
15	12/16	Neural networks	Zhou Chap. 5, DL Chap. 6		See LeCun et al. (2015) for a recent review; compare it with a much older one, Cheng and Titterington (1994).
16	12/23	Oral presentations
	12/25	Oral presentations