In our work on shaping—representatively published in ICML11-NDIL, AAMAS10, abd K-CAP09—we introduced a framework called Training an Agent Manually via Evaluative Reinforcement (TAMER). The TAMER framework, shown above, is an approach to the Shaping Problem that makes use of established supervised learning techniques to model a human's reward function and bases its action selection on the learned model. TAMER is built around the insight that human reward, unlike MDP reward, comprises a full judgement of the long-term desirability of recent behavior. Thus, greedy actions for a TAMER agent are those predicted to receive the most immediate human reward.


Check out our videos of TAMER agents being trained.