2024-09-19

机器学习与数据科学博士生系列论坛(第七十六期)—— Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems

摘要:
Best-Of-Both-Worlds (BOBW) bandit algorithms that have regret guarantees for both stochastic and adversarial settings have been studied for many years and Tsallis-INF (or other FTRL policies) is one of the most promising frameworks for BOBW policies. 

However, a limitation of FTRL policies is that we need to explicitly compute the list of arm selection probabilities. The Follow-The-Perturbed-Leader (FTPL) policy has been researched as a promising candidate to circumvent this limitation. In this talk, we will introduce a FTPL algorithm with Fréchet perturbation, which also achieves the BOBW bound, based on a recent work by Lee, Honda, Ito and Oh (Colt 2024).

论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。

 

返回