00102892: Statistical Learning
Course Description
This is an introductory statistical machine learning course for graduate and upperlevel undergraduate students in statistics, applied
mathematics, computer science, and other fields which involve learning from data. The course covers fundamental principles of machine learning
and major topics in supervised, unsupervised, and semisupervised learning, including linear regression and classification, spline and kernel
smoothing, model selection and regularization, additive models, treebased methods, support vector machines, clustering, principal component
analysis, nonnegative matrix factorization, graphical models, etc.
Syllabus
Final project
Lectures and Assignments
Week  Date  Topics  References  Assignments  Notes and Further Reading 
1  9/19  Basics of machine learning, Occam's razor and no free lunch theorems  ML Chap. 1  ML 1.4  
 9/21  Linear regression and least squares  ESL Secs. 2.2, 3.2  ESL 3.3, 3.4  
2  9/26  Multivariate linear regression, subset selection, ridge regression  ESL Secs. 3.2.4, 3.3, 3.4.1  ESL 3.5, 3.6, 3.11, 3.12  For seemingly unrelated regressions, see Zellner (1962); for mixed integer optimization, see Bertsimas et al. (2016). Homework 1 complete, due 10/10 
3  10/3, 10/5  National Day    
4  10/10  Lasso and its variants, model selection consistency of Lasso  ESL Secs. 3.4.2, 3.4.3, 3.8.3, 3.8.5  ESL 3.16, 3.28, 3.30  For model selection consistency of Lasso, see Zhao and Yu (2006) and Wainwright (2009); for MCP, see Zhang (2010). 
5  10/17  More theory, algorithms for Lasso  ESL Secs. 3.4.4, 3.8.6, 3.9  ESL 3.23, 3.24  For comparisons of conditions for Lasso, see van de Geer and Bühlmann (2009); for ADMM, see Boyd et al. (2011). 
 10/19  Group Lasso, regularized multivariate linear regression, linear and quadratic discriminant analysis  ESL Secs. 3.8.4, 3.7, 4.1–4.3  ESL 4.2, 4.3  For nuclearnorm regularized multivariate linear regression, see Yuan et al. (2007); for sparse discriminant analysis, see Mai et al. (2012). Homework 2 complete, due 10/24 
6  10/24  Logistic regression, separating hyperplanes  ESL Secs. 4.4, 4.5  ESL 4.5, 4.7  
7  10/31  Regression splines  ESL Secs. 5.1–5.3  ESL 5.4, 5.7  For nonlinear interaction models, see Radchenko and James (2010); for the use of piecewise constant approximation in survival models, see Zeng and Lin (2007). 
 11/2  Smoothing splines, multidimensional splines  ESL Secs. 5.4–5.7  ESL 5.13  Homework 3 complete, due 11/7 
8  11/7  Reproducing kernel Hilbert spaces, wavelets  ESL Secs. 5.8, 5.9  ESL 5.15  
9  11/14  Kernel smoothing, local polynomial regression  ESL Secs. 6.1–6.5  ESL 6.2, 6.3, 6.5  For generalized partially linear singleindex models, see Carroll et al. (1997). 
 11/16  Midterm 1 Kernel density estimation  ESL Sec. 6.6.1   Asymptotic properties of kernel density estimators were adapted from Tsybakov (2009), Sec. 1.2. Midterm 1: mean = 46, median = 44, Q1 = 33, Q3 = 58, high score = 89 
10  11/21  Kernel density classification and naive Bayes, model assessment and selection  ESL Secs. 6.6.2, 6.6.3, 7.1–7.3  ESL 6.8, 7.2 Lab 1  Homework 4 complete, due 11/28; Lab 1 due 12/5 
11  11/28  Estimation of generalization error, information criteria  ESL Secs. 7.4–7.9  ESL 7.6, 7.7  For the AIC–BIC dilemma, see Yang (2005) and van Erven et al. (2012). 
 11/30  Crossvalidation and the bootstrap, generalized additive models, classification and regression trees  ESL Secs. 7.10–7.12, 9.1, 9.2   For a review of diversity indices, see Morris et al. (2014). 
12  12/5  Bump hunting, multivariate adaptive regression splines, hierarchical mixtures of experts, boosting  ESL Secs. 9.3–9.5, 10.1–10.4  ESL 10.2, 10.5  Schapire and Freund (2012) is a booklength treatment of boosting. 
13  12/12  More on boosting, boosting trees, gradient boosting  ESL Secs. 10.5, 10.6, 10.9–10.12  ESL 10.8  Homework 5 complete, due 12/14 
 12/14  Support vector machines for classification and regression  ESL Secs. 12.1–12.3, ML Chap. 6  ESL 12.1, 12.2  For multiclass SVMs, see Lee et al. (2004). 
14  12/19  Clustering, principal component analysis  ESL Secs. 14.3, 14.5  ESL 14.2, 14.7  Consistency of Kmeans clustering was studied by Pollard (1981). 
15  12/26  Spectral clustering, nonnegative matrix factorization  ESL Secs. 14.5.3, 14.6  ESL 14.21, 14.23  For consistency of spectral clustering and its application to community detection in social network models, see von Luxburg et al. (2008) and Rohe et al. (2011). Homework 6 complete, due 1/2 
 12/28  Midterm 2    Mean = 57, median = 56, Q1 = 45, Q3 = 67, high score = 93 
16  1/2  Ensemble learning, random forests, Gaussian graphical models  ESL Chaps. 15–17   For recent theoretical and methodological developments on random forests, see Biau and Scornet (2016).

