2026-04-16
机器学习与数据科学博士生系列论坛(第一百期)—— Distributional Temporal Difference Learning with Linear Function Approximation
摘要:
In this talk, we study the finite-sample statistical efficiency of distributional temporal difference (TD) learning with linear function approximation. Distributional TD learning aims to estimate the full return distribution in a discounted Markov decision process under a given policy. Building on our algorithms, we show that with linear function approximation, learning the entire return distribution from streaming data is no more difficult than learning its expectation (the value function). Furthermore, variance reduction techniques can be used to achieve tighter sample complexity bounds independent of the support size. This talk will provide new theoretical insights into when and why distributional reinforcement learning can be statistically efficient, bridging the gap between distributional and classical TD methods in the linear function approximation regime. In addition, we will also present empirical results to show the strengths and limitations of our methods.
论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。
数学学院信息科学系