Next: Conclusion Up: Beating a Defender in Previous: Coping with Noise

Related and Future Works

This work was inspired in large part by the Dynamo project at the University of British Columbia. In particular, Kanazawa worked within this framework to learn when to shoot and when to pass using purely analytical methods [3]. However, since there was no learning involved, the routines had to all be hand-coded.

Christiansen has done extensive work examining the difference between analytical and learning methods [2]. His recent work relates to learning actions in a single-player, continuous-valued environment with static obstacles [16].

Aha and Salzberg use a memory-based approach to learn a continuous function without modelling the physics of the system [1]. Their task, catching a ball, differs from ours in that it has more inputs, a continuous output, and sharp nonlinearities, but they don't attempt to adapt to changes in the environment or contend with an adversary.

Sutton and Whitehead discuss the distinction between off-line and online learning [14]. As they point out, online learning has the advantage of being able to cope with changing concepts. However online techniques must be incremental so that they need not reprocess the entire training set every time a new example is added. Sutton and Whitehead distinguish between ``weakly incremental'' and ``strictly incremental'' techniques: the former require a limited amount of increased memory and computation for additional training examples, while the latter need no increase. Like their examples of STAGGER [10] and connectionist learning methods, our adaptive memory is strictly incremental. Furthermore, unlike many memory-based algorithms (as pointed out by Moore [6]) our adaptive memory is able to learn changing concepts to a certain extent. However, it also captures the notion of resiliency as discussed by Schlimmer [11]: the more training examples it has supporting a concept, the harder it is to unlearn that concept.

Previous work relating to learning changing concepts is reported in [4, 6, 9, 11]. Salganicoff's work is particularly interesting in that it introduces the notion of forgetting training examples based not on how long ago they were introduced, but instead on how closely they correlate with recent examples [9]. We will investigate using Salganicoff's ideas in our future work. Complementary to algorithms that handle changing concepts, there has been some research on adaptive algorithms that change the number of ``neighbors'' that are used to predict the value of an unseen instance [15]. Notice that our adaptive memory automatically captures this notion of locally adaptive algorithms by incorporating varying numbers of training examples in the different memory slots based on the number of past examples in that region.

Future work on our research agenda includes simultaneous learning of the defender and the controlling agent in an adversarial context. We will also explore learning methods with several agents where teams are guided by planning strategies. In this way we will simultaneously study cooperative and adversarial situations using reactive and deliberative reasoning.

Next: Conclusion Up: Beating a Defender in Previous: Coping with Noise

Peter Stone
Mon Dec 11 15:42:40 EST 1995