00103335: Deep Learning and Reinforcement Learning
Course Description
As highly successful and widely applied machine learning methods, deep learning and reinforcement learning
are the core techniques underlying the latest major breakthroughs in the field of AI. Building on the
general principles and methodology of machine learning and motivated by important practical problems, this
course will introduce the basic concepts and methods, mathematical foundations and theory, optimization
algorithms, and applications and case studies of deep learning and reinforcement learning. The part on deep
learning will cover feedforward neural networks, regularization and optimization for deep learning,
convolutional neural networks, recurrent neural networks, and autoencoders and generative models; the part
on reinforcement learning will cover multi-armed bandits, Markov decision processes, dynamic programming,
Monte Carlo methods, temporal difference learning, and deep reinforcement learning.
Syllabus
Lectures and Assignments
| Week | Date | Topics | References | Assignments | Notes |
| 1 | 9/8 | Introduction | FML Chap. 1, UML Chap. 1, DL Sec. 5.1 | | |
| 9/10 | No-free-lunch theorem, bias–variance trade-off | UML Chap. 5, DL Secs. 5.2–5.4 | | |
| 2 | 9/17 | PAC framework, finite hypthesis sets | FML Chap. 2 | FML 2.1, 2.3, 2.7, 2.9, 2.10, 2.12 | |
| 3 | 9/22 | Rademacher complexity | FML Sec. 3.1 | | |
| 9/24 | Growth function, VC dimension | FML Secs. 3.2, 3.3 | FML 3.2, 3.4, 3.8, 3.12, 3.16, 3.17, 3.23, 3.24, 3.31; supplementary problems | Homework 1 due 10/15 |
| 4 | 10/1 | No class | | | |
| 5 | 10/6 | No class | | | |
| 10/8 | No class | | | |
| 6 | 10/15 | Lower bounds, feedforward networks | FML Sec. 3.4, DL Secs. 5.11, 6.1–6.3 | | |
| 7 | 10/20 | Approximation theory | DL Sec. 6.4, UML Secs. 20.3, 20.4, Leshno et al. (1993) | | |
| 10/22 | Backprop, explicit regularization | DL Secs. 6.5, 7.1–7.3 | | |
| 8 | 10/29 | Implicit regularization | DL Secs. 7.4–7.14 | | |
| 9 | 11/3 | Optimization for DL | DL Chap. 8 | | |
| 11/5 | Convolutional and recurrent networks | DL Chap. 9, Secs. 10.1–10.5 | | |
| 10 | 11/12 | LSTM, transformers, introduction to RL | DL Secs. 10.7–10.10, UDL Chap. 12, RL Chap. 1 | Homework 2 | Homework 2 due 11/26 |
| 11 | 11/17 | Multi-armed bandits, Markov decision processes | RL Chap. 2, Secs. 3.1–3.4 | | |
| 11/19 | Bellman equations, dynamic programming | RL Secs. 3.5–3.7, Chap. 4 | | |
| 12 | 11/26 | Midterm exam | | | Mean = 37, median = 36, Q1 = 24, Q3 = 49, high score = 93 |
| 13 | 12/1 | Monte Carlo methods | RL Chap. 5 | | |
| 12/3 | Temporal difference learning, on-policy prediction with approximation | RL Chap. 6, Secs. 7.1, 12.1, Chap. 9 | | |
| 14 | 12/10 | On-policy control with approximation, policy gradient methods | RL Chaps. 10, 13 | RL 2.2, 2.4, 2.8, 3.15, 3.22, 4.2, 4.4, 5.1, 5.4, 6.1, 6.14, 9.5,10.6, 13.1, 13.3 | Homework 3 due 12/24 |
| 15 | 12/15 | Generative models | UDL Chaps. 14–18 | | |
| 12/17 | Oral presentations | | | |
| 16 | 12/24 | Oral presentations | | | Written report due 12/31
|
|