00103335: Deep Learning and Reinforcement Learning
Course Description
As highly successful and widely applied machine learning methods, deep learning and reinforcement learning
are the core techniques underlying the latest major breakthroughs in the field of AI. Building on the
general principles and methodology of machine learning and motivated by important practical problems, this
course will introduce the basic concepts and methods, mathematical foundations and theory, optimization
algorithms, and applications and case studies of deep learning and reinforcement learning. The part on deep
learning will cover feedforward neural networks, regularization and optimization for deep learning,
convolutional neural networks, recurrent neural networks, and autoencoders and generative models; the part
on reinforcement learning will cover multi-armed bandits, Markov decision processes, dynamic programming,
Monte Carlo methods, temporal difference learning, and deep reinforcement learning.
Syllabus
Lectures and Assignments
Week | Date | Topics | References | Assignments | Notes |
1 | 9/9 | Introduction, machine learning basics | FML Chap. 1, DL Chap. 5 | | |
| 9/11 | Bias–variance trade-off, PAC framework | FML Secs. 4.1, 2.1 | | |
2 | 9/18 | Finite hypthesis sets, Radmacher complexity | FML Secs. 2.2–2.4, 3.1 | FML 2.1, 2.3, 2.7, 2.9, 2.10, 2.12 | |
3 | 9/23 | Growth function, VC dimension | FML Secs. 3.2, 3.3 | | |
| 9/25 | Lower bounds, introduction to DL | FML Sec. 3.4, DL Sec. 5.11 | FML 3.2, 3.4, 3.8, 3.12, 3.16, 3.17, 3.23, 3.24, 3.31; supplementary problems | Homework 1 due 10/9 |
4 | 10/2 | No class | | | |
5 | 10/7 | No class | | | |
| 10/9 | Feedforward networks | DL Chap. 6 | | |
6 | 10/16 | Universal approximation | DL Sec. 6.4.1, UML Secs. 20.3, 20.4, Leshno et al. (1993) | | |
7 | 10/21 | Regularization for DL | DL Chap. 7 | | |
| 10/23 | Optimization for DL | DL Chap. 8 | | |
8 | 10/30 | Convolutional and recurrent networks | DL Chap. 9, Secs. 10.1–10.5 | | |
9 | 11/4 | LSTM, transformers | DL Secs. 10.7–10.12, UDL Chap. 12 | Homework 2 | Homework 2 due 11/20 |
| 11/6 | Introduction to RL, multi-armed bandits, Markov decision processes | RL Chaps. 1, 2, Secs. 3.1–3.4 | | |
10 | 11/13 | Bellman equations, dynamic programming | RL Secs. 3.5–3.7, Chap. 4 | | |
11 | 11/18 | Monte Carlo methods, temporal-difference prediction | RL Chap. 5, Secs. 6.1–6.3 | | |
| 11/20 | Temporal-difference control, on-policy prediction with approximation | RL Secs. 6.4–6.8, 7.1, 12.1, Chap. 9 | | |
12 | 11/27 | On-policy control with approximation, policy gradient theorem | RL Chap. 10, Secs. 13.1, 13.2 | | |
13 | 12/2 | Policy gradient methods, planning | RL Secs. 13.3–13.7, Chap. 8 | RL 2.2, 2.4, 2.8; 3.15, 3.22; 4.2, 4.4; 5.1, 5.4; 6.1, 6.14; 9.5;10.6; 13.1, 13.3; 8.5 | Homework 3 due 12/16 |
| 12/4 | Generative models, autoencoders, Boltzmann machines | UDL Chap. 14, DL Chap. 14, Secs. 20.1–20.8 | | |
14 | 12/11 | VAEs, GANs, normalizing flows, diffusion models | UDL Chaps. 15–18 | | |
15 | 12/16 | Neural tangent kernels, mean field theory | TDL Chap. 9, LTFP Sec. 12.3 | | |
| 12/18 | Oral presentations | | | |
16 | 12/25 | Oral presentations | | |
|
|