Closing international conference of Thematic Cycle on Monte-Carlo techniques

International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers

sciencesconf.org:montecarlo16:111214

Optimism and randomness in linear multi-armed bandit

Alessandro Lazaric 1, @ , Marc Abeille 1, @

1 : SEQUEL (INRIA Lille - Nord Europe) - Website

Université Lille III - Sciences humaines et sociales, Université Lille I - Sciences et technologies, CNRS : UMR8146, CNRS : UMR8022, INRIA, Ecole Centrale de Lille

In this talk I will review the linear multi-armed bandit problem and discuss two main algorithmic design approaches. The first one relies on the celebrated principle of "optimism in face of uncertainty", which prescribes to select at each step the action that maximizes the performance when the problem parameters are optimistically chosen within the current level of uncertainty. While the resulting algorithms are effective from a learning point of view, they are computationally expensive as the number of available actions increases. An alternative approach is to use a randomized algorithm, the so-called Thompson sampling, which drastically reduces the computational complexity. On the other hand, this poses theoretical challenges to "control" the negative effects of the stochastic choice of actions. I will indeed show that even in this case it is crucial to preserve a minimum level of optimism to guarantee an order-optimal performance. Finally, I will discuss possible extensions beyond the linear bandit problem.

Subject :	:	Invited talk in mini-symposium
Topics	:	Stochastic algorithms
PDF version	:	PDF version

Presentation

Online user: 1

RSS Feed