Next: Learning a Higher-level Decision Up: A Layered Approach to Previous: The Simulator

Learning a Low-level Skill

Just as young soccer players must learn to control the ball before learning any complex strategies, Soccer Server clients must also acquire low-level skills before exhibiting complex behaviors: the most sophisticated understanding of how to act as part of a team is useless without the ability to execute the necessary individual tasks. Although general agents can be assumed to possess basic domain-independent skills such as moving and sensing, there are always new skills to learn in a new domain. Acting as human coaches for our clients, we identified a low-level skill that is needed in the Soccer Server. Isolating a situation that requires this skill, we drilled the clients, providing the appropriate reinforcement, until they were able to learn to execute this skill reliably.

The low-level skill we identified as being most essential to our Soccer Server clients was the ability to intercept a moving ball. This skill is ubiquitous in all soccer-type frameworks as indicated by the fact that we taught clients a similar skill in a different simulator [21]. Intercepting a moving ball is considerably more difficult than moving to a stationary ball both because of the ball's unpredictable movement (due to simulator noise) and because the client may need to turn and move in such a direction that it cannot see the ball (see Figure 5).

Intercepting a moving ball is a task that arises very frequently in the Soccer Server. Unless the ball has decelerated completely without a player collecting it, this skill is a prerequisite for any kicking action. In particular, defenders must intercept shots and opponents' passes, while players must frequently ``intercept'' passes to them from teammates. In many of these situations, the ball is moving roughly in the direction of the player that is trying to intercept it, the condition which causes the difficulty illustrated in Figure 5. The ball can move past the player as it goes to where the ball used to be. This problem arises primarily because the defender gets sensory information at discrete intervals (250 msec).

Faced with two possible methods for equipping our players with the ability to intercept a moving ball--empirical and analytical--we chose the empirical method for its appropriateness to our Machine Learning paradigm. Rather than providing the clients with the ability to perform sophisticated geometric calculations, we provided the clients with a large number of training examples and used a supervised learning technique: Neural Networks (NNs).

Figure 6: At the beginning of each trial, the defender starts 4 units from the goal, while the ball and shooter are placed randomly between 20 and 30 units from the defender.
Figure 5: If the defender moves directly towards the ball (left arrow), it will miss entirely. If the defender turns to move in the appropriate direction (right arrow), it may no longer be able to see the ball.

The range of situations from which training examples were gathered is illustrated in Figure 6. For each training example, the shooter kicks the ball directly towards the defender with a fixed power. However, due to the noise in the simulator, the ball does not always move directly at the defender: if the defender remains still, the ball hits it only 35% of the time. Furthermore, if the defender keeps watching the ball and moving directly towards it, it is only able to stop the ball 53% of the time.

The defender's behavior during training is more complex than the shooter's. As we are using a supervised learning technique, it must first gather training data by acting randomly and recording the results of its actions. It does so as follows (BD = Ball's distance, BA = Ball's angle, TA = Turn angle -- the angle to turn after facing the ball):

While BD > 14, TURN(BA)
When BD 14, set TA = Random Angle between -45 and 45
Record BD, BA, previous BD, and TA
TURN(BA + TA)
DASH()
Record result (from coach)

Until the ball is within a given range, the defender simply watches and faces the ball. Then, once the ball is in range, the defender turns a random angle (within a range) away from the ball and dashes. Of course the defender misses most (76%) of the time, but after about 750 positive examples, it is able to learn to perform much better (see below).

In order to automate the training process, a coach client is used. The coach ends a trial when the ball gets past the defender or when it starts moving back towards the shooter. In the latter case, the trial is labeled a SAVE. In the former case, it is labeled a GOAL if the ball is still between the goal posts and a MISS if it is heading wide of the goal. Only saves are considered positive results and thus used for training. At the end of the trial, the coach resets the positions of both players and the ball for another trial.

The goal of learning is to allow the defender to choose the appropriate turn angle (TA) based upon the BD, BA, and previous BD. Thus, only the data from the saves during the training phase are useful (the NN is learning a continuous, not a binary, output). In order to learn the TA, we chose to use a Neural Network (NN). Other supervised learning techniques, such as memory-based learning, could also have worked (see below). After a small amount of experimentation with different NN configurations, we settled on a fully-connected net with 4 sigmoid hidden units and a learning rate of . The weights connecting the input and hidden layers used a linearly decreasing weight decay starting at .1%. We used a linear output unit with no weight decay. We trained for 3000 epochs. This configuration proved to be satisfactory for our task with no need for extensive tweaking of the network parameters.

In order to test the NN's performance, we ran 1000 trials with the defender using the output of the NN to determine its turn angle. The behaviors of the shooter and the coach were the same as during training. The results for NNs trained with different numbers of training examples are displayed in Figure 7. The misses are not included in the results since those are the shots that are far enough wide that the defender does not have much chance of even reaching the ball before it is past. The figure also records the percentage of shots on-goal (Saves+Goals) that the defender saved. Reasonable performance is achieved with only 300 save examples, and examples beyond about 750 do not improve performance. The defender is able to save almost all the shots despite the continual noise in the ball's movement.

Figure 7: The defender's performance when using NNs trained with different numbers of positive examples. The last column of the table indicates the percentage of shots that were ``on goal'' that the defender saved.

In order to study the effect of noise in the ball's movement upon the defender's performance, we varied the amount of noise in the Soccer Server (the ball_rand parameter). Figure 8 shows the effect of varying noise upon the defender when it uses the trained NN (trained with 750 examples) and when it moves straight towards the ball.

Figure 8: The defender's performance when using NNs and moving straight with different amounts of ball noise.

The default value of noise is .05, meaning that on every simulator step, the true position of the ball is perturbed by a random amount between -.05 and .05 with uniform probability distribution over the range. The ``straight'' behavior always sets TA=0, causing the defender to go directly towards where it last saw the ball. Notice that with no ball noise, both the straight and learned behaviors are successful: the ball and the defender move straight towards each other. As the noise in the ball's motion increases, the advantage to using the learned interception behavior becomes significant. The advantage of the NN can also be seen with no noise if the shooter aims slightly wide (by 4 degrees) of the goal's center. Then the defender succeeds 99% of the time when using the NN, and only 10% of the time when moving straight towards the ball.

It appears that the NN solution allows the defender to intercept a moving ball as well as can be hoped for given the discrete sensory events and simulator noise. However, as mentioned above and particularly because this problem may not be overly complex, there are other promising ways of approaching the problem. For example, since we notice that the NN weighs one of the inputs much more heavily than the other two, it appears that memory-based techniques would work quite well.

From a quick examination of the inputs and outputs of the trained NN, it appears that the NN focusses primarily upon the Ball's angle (BA) . Consequently, we were curious to try a behavior that simply used a lookup table mapping BA to the typical output of the NN for that BA. We identified such outputs for BA's ranging from -7 to 7. Using this one dimensional lookup-table, the defender was able to perform almost as well as when using the full NN (see Table 1). Although the lookup table was built with the aid of a NN, the one-dimensional function could also be easily learned with memory-based methods.

We also were curious about how well the NN would compare to analytical methods. As a basis for comparison, we used a behavior constructed by Mike Bowling, an undergraduate student in the project, whose goal was to create the best possible analytic behavior. The resulting behavior computed the ball's motion vector from its 2 previous positions and multiplied this vector by 3, thus predicting the ball's position two sensory steps (500 msec) into the future. The defender's TA was then the angle necessary to move directly towards the end of the lengthened vector. Using this technique, the defender was also able to perform almost as well as it did when using the NN (see Table 1).

table173
Table 1: The defender's performance when using a NN, a one-dimensional lookup table, and an analytic method to determine the TA.

In this section we verified that a supervised learning technique was useful for learning a low-level skill that is needed by clients in the Soccer Server. Although there were other possible methods of attaining this skill, the method we chose is at least as good as the other options. Furthermore, it is in keeping with our goal of building up learning clients by layering learned behaviors on top of each other.

Next: Learning a Higher-level Decision Up: A Layered Approach to Previous: The Simulator

Peter Stone
Mon Mar 31 12:26:29 EST 1997