Wei Lin @ PKU

00103335: Deep Learning and Reinforcement Learning

Course Description

As highly successful and widely applied machine learning methods, deep learning and reinforcement learning are the core techniques underlying the latest major breakthroughs in the field of AI. Building on the general principles and methodology of machine learning and motivated by important practical problems, this course will introduce the basic concepts and methods, mathematical foundations and theory, optimization algorithms, and applications and case studies of deep learning and reinforcement learning. The part on deep learning will cover feedforward neural networks, regularization and optimization for deep learning, convolutional neural networks, recurrent neural networks, and autoencoders and generative models; the part on reinforcement learning will cover multi-armed bandits, Markov decision processes, dynamic programming, Monte Carlo methods, temporal difference learning, and deep reinforcement learning.

Syllabus

Lectures and Assignments

Week Date Topics References Assignments Notes
1 9/9 Introduction, machine learning basics FML Chap. 1, DL Chap. 5
9/11 Bias–variance trade-off, PAC framework FML Secs. 4.1, 2.1
2 9/18 Finite hypthesis sets, Radmacher complexity FML Secs. 2.2–2.4, 3.1 FML 2.1, 2.3, 2.7, 2.9, 2.10, 2.12
3 9/23 Growth function, VC dimension FML Secs. 3.2, 3.3
9/25 Lower bounds, introduction to DL FML Sec. 3.4, DL Sec. 5.11 FML 3.2, 3.4, 3.8, 3.12, 3.16, 3.17, 3.23, 3.24, 3.31; supplementary problems Homework 1 due 10/9
4 10/2 No class
5 10/7 No class
10/9 Feedforward networks DL Chap. 6
6 10/16 Universal approximation DL Sec. 6.4.1, UML Secs. 20.3, 20.4, Leshno et al. (1993)
7 10/21 Regularization for DL DL Chap. 7
10/23 Optimization for DL DL Chap. 8
8 10/30 Convolutional and recurrent networks DL Chap. 9, Secs. 10.1–10.5
9 11/4 LSTM, transformers DL Secs. 10.7–10.12, UDL Chap. 12 Homework 2 Homework 2 due 11/20
11/6 Introduction to RL, multi-armed bandits, Markov decision processes RL Chaps. 1, 2, Secs. 3.1–3.4
10 11/13 Bellman equations, dynamic programming RL Secs. 3.5–3.7, Chap. 4
11 11/18 Monte Carlo methods, temporal-difference prediction RL Chap. 5, Secs. 6.1–6.3
11/20 Temporal-difference control, on-policy prediction with approximation RL Secs. 6.4–6.8, 7.1, 12.1, Chap. 9
12 11/27 On-policy control with approximation, policy gradient theorem RL Chap. 10, Secs. 13.1, 13.2
13 12/2 Policy gradient methods, planning RL Secs. 13.3–13.7, Chap. 8 RL 2.2, 2.4, 2.8; 3.15, 3.22; 4.2, 4.4; 5.1, 5.4; 6.1, 6.14; 9.5;10.6; 13.1, 13.3; 8.5 Homework 3 due 12/16
12/4 Generative models, autoencoders, Boltzmann machines UDL Chap. 14, DL Chap. 14, Secs. 20.1–20.8
14 12/11 VAEs, GANs, normalizing flows, diffusion models UDL Chaps. 15–18
15 12/16 Neural tangent kernels, mean field theory TDL Chap. 9, LTFP Sec. 12.3
12/18 Oral presentations
16 12/25 Oral presentations