Wei Lin @ PKU

00103335: Deep Learning and Reinforcement Learning

Course Description

As highly successful and widely applied machine learning methods, deep learning and reinforcement learning are the core techniques underlying the latest major breakthroughs in the field of AI. Building on the general principles and methodology of machine learning and motivated by important practical problems, this course will introduce the basic concepts and methods, mathematical foundations and theory, optimization algorithms, and applications and case studies of deep learning and reinforcement learning. The part on deep learning will cover feedforward neural networks, regularization and optimization for deep learning, convolutional neural networks, recurrent neural networks, and autoencoders and generative models; the part on reinforcement learning will cover multi-armed bandits, Markov decision processes, dynamic programming, Monte Carlo methods, temporal difference learning, and deep reinforcement learning.

Syllabus

Lectures and Assignments

Week	Date	Topics	References	Assignments	Notes
1	9/9	Introduction, machine learning basics	DL Sec. 5.1
2	9/14	Generalization, bias–variance trade-off	DL Secs. 5.2–5.4		See Belkin et al. (2019) for the double descent phenomenon and Hastie et al. (2022) for a theory in linear models.
	9/16	Inference principles, feedforward networks	DL Secs. 5.4–5.11, 6.1–6.3
3	9/23	Universal approximation, back-propagation	DL Secs. 6.4–6.6
4	9/28	Regularization for DL, weight decay	DL Secs. 7.1–7.5
	9/30	Early stopping, dropout	DL Secs. 7.8–7.14	Homework 1
5	10/7	Optimization for DL	DL Secs. 8.1–8.3
6	10/12	Initialization, adaptive algorithms	DL Secs. 8.4–8.7
	10/14	Convolutional networks	DL Chap. 9		See He et al. (2016) for residual networks.
7	10/21	Recurrent networks	DL Chap. 10	Homework 2
8	10/26	Reinforcement learning basics, multi-armed bandits	RL Chaps. 1 & 2
	10/28	Markov decision processes	RL Chap. 3
9	11/4	Dynamic programming	RL Chap. 4
10	11/9	Monte Carlo methods	RL Chap. 5		A recent progress on the open question about the convergence of MCES was made by Wang et al. (2022).
	11/11	Temporal difference learning	RL Chap. 6, Secs. 7.1–7.3, 12.1	Homework 3
11	11/18	Midterm exam			Mean = 43, median = 43.5, Q1 = 34, Q3 = 54.5, high score = 75
12	11/23	Autoencoders, approximate inference	DL Chaps. 14 & 19
	11/25	Deep generative models	DL Chap. 20		For diffusion models, see Sohl-Dickstein et al. (2015) and Ho, Jain & Abbeel (2020).
13	12/2	On-policy prediction with approximation	RL Chap. 9
14	12/7	On-policy control with approximation, policy gradient methods	RL Chaps. 10 & 13	Homework 4
	12/9	Oral presentations
15	12/16	Oral presentations			Written report due 12/21