00103335: Deep Learning and Reinforcement Learning
Course Description
As highly successful and widely applied machine learning methods, deep learning and reinforcement learning
are the core techniques underlying the latest major breakthroughs in the field of AI. Building on the
general principles and methodology of machine learning and motivated by important practical problems, this
course will introduce the basic concepts and methods, mathematical foundations and theory, optimization
algorithms, and applications and case studies of deep learning and reinforcement learning. The part on deep
learning will cover feedforward neural networks, regularization and optimization for deep learning,
convolutional neural networks, recurrent neural networks, and autoencoders and generative models; the part
on reinforcement learning will cover multi-armed bandits, Markov decision processes, dynamic programming,
Monte Carlo methods, temporal difference learning, and deep reinforcement learning.
Syllabus
Lectures and Assignments
Week | Date | Topics | References | Assignments | Notes |
1 | 9/9 | Introduction, machine learning basics | DL Sec. 5.1 | | |
2 | 9/14 | Generalization, bias–variance trade-off | DL Secs. 5.2–5.4 | | See Belkin et al. (2019) for the double descent phenomenon and Hastie et al. (2022) for a theory in linear models. |
| 9/16 | Inference principles, feedforward networks | DL Secs. 5.4–5.11, 6.1–6.3 | | |
3 | 9/23 | Universal approximation, back-propagation | DL Secs. 6.4–6.6 | | |
4 | 9/28 | Regularization for DL, weight decay | DL Secs. 7.1–7.5 | | |
| 9/30 | Early stopping, dropout | DL Secs. 7.8–7.14 | Homework 1 | |
5 | 10/7 | Optimization for DL | DL Secs. 8.1–8.3 | | |
6 | 10/12 | Initialization, adaptive algorithms | DL Secs. 8.4–8.7 | | |
| 10/14 | Convolutional networks | DL Chap. 9 | | See He et al. (2016) for residual networks. |
7 | 10/21 | Recurrent networks | DL Chap. 10 | Homework 2 | |
8 | 10/26 | Reinforcement learning basics, multi-armed bandits | RL Chaps. 1 & 2 | | |
| 10/28 | Markov decision processes | RL Chap. 3 | | |
9 | 11/4 | Dynamic programming | RL Chap. 4 | | |
10 | 11/9 | Monte Carlo methods | RL Chap. 5 | | A recent progress on the open question about the convergence of MCES was made by Wang et al. (2022). |
| 11/11 | Temporal difference learning | RL Chap. 6, Secs. 7.1–7.3, 12.1 | Homework 3 | |
11 | 11/18 | Midterm exam | | | Mean = 43, median = 43.5, Q1 = 34, Q3 = 54.5, high score = 75 |
12 | 11/23 | Autoencoders, approximate inference | DL Chaps. 14 & 19 | | |
| 11/25 | Deep generative models | DL Chap. 20 | | For diffusion models, see Sohl-Dickstein et al. (2015) and Ho, Jain & Abbeel (2020). |
13 | 12/2 | On-policy prediction with approximation | RL Chap. 9 | | |
14 | 12/7 | On-policy control with approximation, policy gradient methods | RL Chaps. 10 & 13 | Homework 4 | |
| 12/9 | Oral presentations | | | |
15 | 12/16 | Oral presentations | | | Written report due 12/21
|
|