Next: Scaling Up to Team-level Up: A Layered Approach to Previous: Learning a Low-level Skill

Learning a Higher-level Decision

Once young soccer players have learned how to control the ball, they are ready to use their skills to start learning how to make decisions on the field and playing as part of a team. Similarly, our clients can use their learned ball-interception skill to exhibit a more complex behavior: passing. Passing requires action by two different clients. A passer must kick the ball towards the receiver, who must collect the ball. Since the receiver's task is identical to that of the defender in the previous section, the clients can (and do) use the same trained NN.

Although the execution of a pass in the open field is not difficult given the receiver's ball-interception skill, it becomes more complicated in the presence of defenders. If in the proper position, a defender (also equipped with the same ball-interception skill) may be able to intercept the ball before it reaches the receiver. Thus, the passer is faced with the task of assessing the likelihood that a pass to a particular receiver will succeed. For example, in Figure 9 the left-most teammate may be able to receive a pass, while the two directly to the right are much less likely to be able to do so. The higher-level decision that our clients learned was whether or not to pass to a given teammate.

Just as this behavior builds upon the interception skill, higher-level behaviors can be built upon this knowledge of when a pass will succeed. Such knowledge can contribute to the decision of which player to pass to or whether to pass, dribble, or shoot.

When deciding whether or not to make a pass, the passer has many possible features of the scenario at its disposal. When many features are available, it can be very difficult to pick out the ones that are relevant for building an analytical model. Rather than going through and filtering the attributes by hand, we chose to use a learning method that is capable of determining for itself which attributes to use. In particular, we used Decision Trees (DTs).

In order to gather the training data, we again defined a constrained situation and used a coach client to monitor the trials. Since passing requires coordination of the passer and the receiver, each trial was somewhat involved:

The coach randomly placed the players (Figure 9).
The passer announced its intention to pass (Figure 9).
The receivers replied with their views of the field when ready to receive (Figure 10).
The passer chose a receiver randomly during training, or with a DT during testing (Figure 11).
The passer recorded a large number of attributes describing the trial (see below).
The passer announced who it was passing to (Figure 12).
The receiver and 4 defenders attempted to get the ball using the learned ball-interception skill (Figure 13).
The coach classified the example as a SUCCESS if the receiver managed to pass the ball back toward the passer; a FAILURE if one of the defenders cleared the ball to a side; or a MISS if the receiver and the defenders failed to intercept the ball (Figure 13).

Figure 10: When the receivers are facing the ball, they tell the passer what the world looks like to them. The passer can use the transmitted data to help it assess the likelihood that each receiver would successfully receive a pass. The data includes distances and angles to the other players as well as some counts of players within given distances and angles.
Figure 9: At the beginning of a trial, the passer is placed behind the ball. 3 teammates and 4 opponents are placed randomly within the region indicated by the dashed line, while 2 other players from each team are placed randomly on the field. In the following figures, the players involved in the play are enlarged for presentation purposes. When the passer sees that it has the ball, it announces its intention to pass. Its goal is to assess the likelihood of a pass to a given teammate succeeding.

Figure 12: After choosing its receiver, the passer announces its decision so that the receiver knows to expect the ball and the other teammates can move on to other behaviors. In our experiments, the non-receivers remain stationary.
Figure 11: During training, the passer chooses its receiver randomly. During testing, it uses a DT to evaluate the likelihood that a pass to each of the teammates would succeed. It passes to the most likely receiver (Receiver 2 in this case).

Figure 13: Finally, the receiver records the result of the pass.

The key part of gathering training examples was the passer's recording of the attributes describing the trial. Rather than restricting the number of attributes, we capitalized on the DT's ability to filter out the irrelevant ones. Thus, we gathered a total of 174 attributes (in addition to the coach's label) for each trial, half each from the passer's and the receiver's perspective. The attributes from the receiver's perspective were communicated to the passer before it had to decide which player to pass to. The attributes--all continuous--available to the DT were:

Distance and Angle to the receiver (2);
Distance and Angle to other teammates (up to 9) sorted by angle from the receiver (18);
Distance and Angle to opponents (up to 11) sorted by angle from the receiver (22);
Counts of teammates, opponents, and players within given distances and angles of the receiver (45);
Distance and Angle from receiver to teammates (up to 10) sorted by distance (20);
Distance and Angle from receiver to opponents (up to 11) sorted by distance (22);
Counts of teammates, opponents, and players within given distances and angles of the passer from the receiver's perspective (45);

Whenever fewer than the maximum number of players were visible, the remaining attributes were marked as unknown.

The goal of learning is to use these attributes to predict whether a pass to the given receiver will lead to a SUCCESS, a FAILURE, or a MISS. For training, we used standard off-the-shelf C4.5 code with all of the default parameters [14]. We gathered a total of 5000 training examples, 51% of which were successes, 42% of which were failures, and 7% of which were misses.

Training on this data produced a pruned tree with 87 nodes giving 26% error on the training set. The tree is shown in Figure 14. All of the attributes starting with ``passer'' are from the passer's perspective. Notice that these are used much more frequently than the attributes from the receiver's perspective. Thus the trained tree is comparably effective when the passer must decide without any input from the potential receivers. The first node in the tree tests for the number of opponents within 6 degrees of the receiver from the passer's perspective. If there are any, the tree predicts that the pass will fail. Otherwise, the tree moves on to the second node which tests the angle of the first opponent. Since the passer sorts the opponents by angle, this is the closest opponent to the receiver in terms of angle from the passer's perspective. If there is no opponent within 13 degrees of the receiver, the tree predicts success. Otherwise it goes on to deeper nodes in the tree.

Figure 14: The trained decision tree. Some subtrees with fewer cases covered have been removed for purposes of presentation. Attributes starting ``passer'' are from the passer's perspective. Attributes starting ``receiver'' are from the receiver's perspective. For example, ``receiver players dist8 ang12'' is the number of players that the receiver sees within a distance of 8 and angle of 12 from the passer.

In order to test the DT's performance we ran 5000 trials with the passer using the DT to choose the receiver. All other behaviors were the same as during training. Since the DT returns a confidence estimate in its classification, the passer can choose the best receiver candidate even if more than one is classified as likely to be successful. If the tree predicts a failure for all three receivers, the one with the lowest confidence reading can be selected. Notice that during testing, the passer must pass, while in a game situation the passer would be given the option to dribble or shoot instead.

We compiled results sorted by the DT's confidence in the success of the pass to the chosen receiver (see Table 2). The largest number of passes were classified as successes with confidence between .7 and .8, with another large portion classified as successes with confidence between .8 and .9. Overall, the success rate of 65% is much better than the 51% success rate obtained when a receiver was chosen randomly. However, this result was obtained under a condition of forced passing: the passer was required to pass the ball during all trials. Notice that if the passer wanted to be fairly sure of success, it could pass only when the DT predicted success with confidence greater than .8. The resulting 79% success rate approaches the limit imposed by the success rate of the ball-interception skill. When the testing is repeated with no defenders to intercept the ball, the success rate is 86%.

table242
Table 2: The results of 5000 trials during which the passer used the DT to choose the receiver. Overall results are given as well as a breakdown by the passer's confidence prior to the pass. The passer was forced to pass even if it predicted failures for all 3 teammates. In that case, it passed to the teammate with the lowest likelihood of failure. Results are given in percentages of the number of such cases (shown in parentheses).

With all the different attributes to choose from, it was not obvious how to construct an analytic heuristic for the passer to use when choosing a receiver. However, we needed some comparison other than the passer's random choice during training. A reasonable improvement over the random choice is to pass to the closest teammate. For this reason, we compared the DT decision with the closest teammate heuristic.

Over 5000 trials, the closest teammate heuristic produced a success rate of 64%. Although this number compares favorably with the overall DT success rate, it is significantly lower than the 79% success rate the passer can achieve with the DT when given the option of not passing. Furthermore, the closest teammate heuristic gives no way of estimating the likelihood that a pass will succeed. It simply postulates that given a choice, the passer should pass to the closer teammate. Since the likelihood estimation is the true goal of our learning in this section, there is a clear advantage to using the DT method. When deciding whether to pass, dribble, or shoot, the knowledge of whether or not a given pass is likely to succeed will be extremely useful.

In this section, we demonstrated that a higher-level decision could be built upon the low-level skill learned in the previous section. Using a DT, our clients learned to judge the likelihood that a pass to a given receiver would be successfully received. This judgement represented a second layer in our quest to build intelligent Soccer Server clients by layered learning.

Next: Scaling Up to Team-level Up: A Layered Approach to Previous: Learning a Low-level Skill

Peter Stone
Mon Mar 31 12:26:29 EST 1997