■ View full text
■ Researchers
Beomjoon Kim
School of Computer Science McGill University
Amir-massoud Farahmand
School of Computer Science McGill University
Joelle Pineau
School of Computer Science McGill University
Doina Precup
School of Computer Science McGill University
■ Abstract
We propose a Learning from Demonstration (LfD) algorithm which leverages expert data, even if they are very few or inaccurate. We achieve this by using both expert data, as well as reinforcement signals gathered through trial-and-error interactions with the environment. The key idea of our approach, Approximate Policy Iteration with Demonstration (APID), is that expert’s suggestions are used to define linear constraints which guide the optimization performed by Approximate Policy Iteration. We prove an upper bound on the Bellman error of the estimate computed by APID at each iteration. Moreover, we show empirically that APID outperforms pure Approximate Policy Iteration, a state-of-the-art LfD algorithm,and supervised learning in a variety of scenarios, including when very few and/or suboptimal demonstrations are available. Our experiments include simulations aswell as a real robot path-finding task.
전체댓글 0