PIDSIA Seminar by Francesco Faccio - Policy Optimization via Importance Sampling
27 November 2018
Galleria 1, 2nd floor, room G1-204 @12:00
Policy optimization is an effective Reinforcement Learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this talk, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.

The speaker

Francesco Faccio is a Master Student in Mathematical Engineering at Politecnico di Milano. He is currently working as an intern at IDSIA, where he completed his Master's thesis. His main research interests include Reinforcement Learning, Recurrent Neural Networks and Bayesian Statistics. The presented work has been developed in collaboration with Alberto Maria Metelli, Matteo Papini and Marcello Restelli from Politecnico di Milano. It will be presented at the next 32nd Conference on Neural Information Processing Systems (NIPS 2018). Selected for an oral presentation.


Registration is welcome

Pizza (or alternative food) and drinks will be offered at the end of the talk. If you plan to attend, please register in a timely fashion at the following link so that we will have no shortage of food: