Wei Lin @ PKU

00103335: Deep Learning and Reinforcement Learning

Course Description

As highly successful and widely applied machine learning methods, deep learning and reinforcement learning are the core techniques underlying the latest major breakthroughs in the field of AI. Building on the general principles and methodology of machine learning and motivated by important practical problems, this course will introduce the basic concepts and methods, mathematical foundations and theory, optimization algorithms, and applications and case studies of deep learning and reinforcement learning. The part on deep learning will cover feedforward neural networks, regularization and optimization for deep learning, convolutional neural networks, recurrent neural networks, and autoencoders and generative models; the part on reinforcement learning will cover multi-armed bandits, Markov decision processes, dynamic programming, Monte Carlo methods, temporal difference learning, and deep reinforcement learning.

Syllabus

Lectures and Assignments

Week	Date	Topics	References	Assignments	Notes
1	9/9	Introduction, machine learning basics	FML Chap. 1, DL Chap. 5
	9/11	Bias–variance trade-off, PAC framework	FML Secs. 4.1, 2.1
2	9/18	Finite hypthesis sets, Radmacher complexity	FML Secs. 2.2–2.4, 3.1	FML 2.1, 2.3, 2.7, 2.9, 2.10, 2.12
3	9/23	Growth function, VC dimension	FML Secs. 3.2, 3.3
	9/25	Lower bounds, introduction to DL	FML Sec. 3.4, DL Sec. 5.11	FML 3.2, 3.4, 3.8, 3.12, 3.16, 3.17, 3.23, 3.24, 3.31; supplementary problems	Homework 1 due 10/9
4	10/2	No class
5	10/7	No class
	10/9	Feedforward networks	DL Chap. 6
6	10/16	Universal approximation	DL Sec. 6.4.1, UML Secs. 20.3, 20.4, Leshno et al. (1993)
7	10/21	Regularization for DL	DL Chap. 7
	10/23	Optimization for DL	DL Chap. 8
8	10/30	Convolutional and recurrent networks	DL Chap. 9, Secs. 10.1–10.5
9	11/4	LSTM, transformers	DL Secs. 10.7–10.12, UDL Chap. 12	Homework 2	Homework 2 due 11/20
	11/6	Introduction to RL, multi-armed bandits, Markov decision processes	RL Chaps. 1, 2, Secs. 3.1–3.4
10	11/13	Bellman equations, dynamic programming	RL Secs. 3.5–3.7, Chap. 4
11	11/18	Monte Carlo methods, temporal-difference prediction	RL Chap. 5, Secs. 6.1–6.3
	11/20	Temporal-difference control, on-policy prediction with approximation	RL Secs. 6.4–6.8, 7.1, 12.1, Chap. 9
12	11/27	On-policy control with approximation, policy gradient theorem	RL Chap. 10, Secs. 13.1, 13.2
13	12/2	Policy gradient methods, planning	RL Secs. 13.3–13.7, Chap. 8	RL 2.2, 2.4, 2.8; 3.15, 3.22; 4.2, 4.4; 5.1, 5.4; 6.1, 6.14; 9.5;10.6; 13.1, 13.3; 8.5	Homework 3 due 12/16
	12/4	Generative models, autoencoders, Boltzmann machines	UDL Chap. 14, DL Chap. 14, Secs. 20.1–20.8
14	12/11	VAEs, GANs, normalizing flows, diffusion models	UDL Chaps. 15–18
15	12/16	Neural tangent kernels, mean field theory	TDL Chap. 9, LTFP Sec. 12.3
	12/18	Oral presentations
16	12/25	Oral presentations