After training, a neural network could be used very cheaply--just a
single forward pass at each decision point--to decide when to begin
accelerating. Notice that within a single trial, only one input to
the neural network varied: the Ball Distance decreased
as the ball approached the Contact Point. Thus, the
output of the neural network tended to vary fairly regularly. As the ball began
approaching, the output began increasing first slowly and then
sharply. After reaching its peak, the output began decreasing slowly
at first and then sharply. The optimal time for the shooter to begin
accelerating was at the peak of this function, however since the
function peaked at different values on different trials, we used the
following 3-input neural network shooting policy:
Begin accelerating when Output .6 AND Output < Previous output - .01.
Requiring that Output .6 ensured that the shooter would only start moving if it ``believed'' it was more likely to score than to miss. Output < Previous output - .01 became true when the output of the neural network was just past its peak. Requiring that the output decrease by at least .01 from the previous output ensured that the decrease was not due simply to sensor noise.
Using this learned 3-input neural network shooting policy, the shooter scored 96.5% of the time. The results reported in this section are summarized in Table 2.
Table 2: Results before and after learning for fixed ball motion.
Even more important than the high success rate achieved when using the learned shooting policy was the fact that the shooter achieved the same success rate in each of the four symmetrical reflections of the training situation (the four action quadrants). With no further training, the shooter was able to score from either side of the goal on either side of the field. Figure 4(a) illustrates one of the three symmetrical scenarios. The world description used as input to the neural network contained no information specific to the location on the field, but instead captured only information about the relative positions of the shooter, the ball, and the goal. Thanks to these flexible inputs, training in one situation was applicable to several other situations.