NEAT Robot Competitive Coevolution Demo

This page is based on neuroevolution research by Kenneth Stanley. Movies come from preliminary experiments

Continual competitive coevolution utilizes growing structures

This page contains links to movies (in the form of GIF animations) depicting evolved robot controllers controlling simulated Khepera-like robots in a real-time competitive task. The controllers were evolved using the NEAT method for evolving neural network topolgies and weights using a genetic algorithm. All of the movies depict evolved neuro-controllers from a single run of hundreds of generations. The movies show the robots' behavior in a task, and also show the actual evolved neural network controllers with neurons firing in real-time, so you can see what the robots are "thinking" as they compete.

The main point of the experiment is to show that evolving neural networks using NEAT allows a new, more powerful kind of competitive coevolution to take place. This powerful form of coevolution is called continual competitive coevolution. The idea is that because NEAT allows networks to add structure indefinitely, a competing population in competitive coevolution should always be able to supplement a strategy by complexifying that strategy through the addition of new structure. The hope is that increasingly sophisticated strategies will utilize increasingly complex structures, preventing the arms race from stagnating.

The competition

The competitive task is challenging. The robots always start out in a world with exactly the same configuration. 9 pieces of food are placed in the world, with the two robots in the center. The image below shows what the world looks like. Please adjust your browser so that the entire image is visible on your screen:

On the left of the image is the world where the robots collect food. On the right are the robots' neural network controllers. The yellow robot is controlled by the upper neural network, and the red robot is controlled by the lower network.

Each network has 3 numbers to its left. These numbers represent:

Food collected

Total distance (pixels) traveled

Energy

Whenever a robot encounters a piece of food, its energy goes up by 500. At the same time, energy decreases the more distance a robot travels. There is no limit to how low energy can go; it can easily sink below zero.

The only way for a robot to win the compition is to collide with its opponent while its energy is higher than its opponent's energy. This rule makes the game interesting, because the predator-prey relationship can constantly shift back and forth, depending on food being eaten. Energy is used for no other purpose than to determine who kills whom in a collision; it does not affect robot's ability to move, although moving can be risky when energy is low, especially when no food is available, since the robot with lower energy can be killed. As long as a robot has higher energy, it is safe.

The rules of the game encourage an unusual phenomenon to evolve: switching behavior. Robots must learn to change their goals quickly depending on which robot has the most energy at any given time. A robot with lower energy may decide to stay put and hope its opponent wastes energy by running around, or it may try to find more food. As you can imagine, complex strategies arise. The main reason that this scenario leads to complexity is that unlike in standard predator-prey tasks, both the predator and the prey are evolving to outwit each other, and networks must be capable of playing both roles.

Understanding the Movies

In the image above, the world contains two robots, each surrounded by 2 concentric rings. The rings represent rings of range sensors, which work like Khepera range sensors. However, one ring of sensors can detect robots, and the other can detect food. Each ring has 4 sensors with slightly overlapping arcs of sensitivity. The inner ring contains the food sensors, and the outer ring holds the robot sensors. When you watch the movies, you can actually see the sensor activation levels on the rings themselves, and you can see the sensors swiveling as the robots turn. Sensor activations are depicted as squares of different sizes. Note that the robot who currently has the higher energy level has yellow sensor rings, whereas the robot with lower energy has black sensor rings. This way, it is easy to see who could be playing predator at any given time.

The sensors map to the inputs of of the neural networks on the right of the image. The inputs are at the bottom of each network:

The first four inputs correspond to the food range sensors (the inner ring)

the second four inputs correspond to the robot range sensors (the outer ring).

Sensor 9 is an energy difference sensor. This is the sensor that tells the robot if it has higher or lower energy than its opponent (so it can change its strategy accordingly)

Sensor 10 is a single wall sensor, telling the robot its distance from the border in the direction it is facing

Sensor 11 is a bias always set to 1

Each network has two motor outputs (at the top). The outputs represent the force to be applied the left and right motor. Each motor drives a wheel. Thus, running the left motor alone causes a turn to the right, etc. Each output has a number above it representing its current activation on a scale of 0 to 100,000 (actual outputs are between 0 and 1).

The neural networks depict evolved topologies. They are composed of:

Neurons are red squares. The larger the square, the higher the activation (you can watch neurons fire in real time by seeing their squares grow and shrink)

Excitatory connections are black links

Inhibitory connects are blue links

Recurrent connections are bent lines instead of straight lines (usually forming loops), also either blue or black

Since NEAT evolves arbitrary topologies, topologies can become quite complex. The visualizations are designed to do their best to display the networks as clearly as possible.

The experiment

As usual , NEAT begins with zero-hidden-node networks and evolves topologies and links from there. If you want to know more details about how the NEAT method works, see our paper. The initial generation contained two populations, A and B, all with uniform zero-hidden-node topologies. A and B were pitted against each other in a competitive coevolution host/parasite framework (see "New Method for Competitive Coevolution", Rosin and Belew, 1996). As the generations progress, an arms race built up between A and B. The results show that increasingly sophisticated structures continually arose (for hundreds of generations), each newly dominant strategy containing significantly more structure than the previous. Also, there was no circularity in the dominance relationships.

Who's the best?

Before moving to the animations, we should address one serious question: How do we know that a network is better than another network from just a single competition? There are two answers to this question:

First, the training scenarios are all deterministic (meaning that the food and robots start in exactly the same positions). Therefore, if a network wins against another network in this setup, it will always win, since we did not introduce noise to the sensors.

However, in testing the networks for superiority we should really play many different competitions to attain statistical significance demonstrating that one network really is better than another. Thus, we played each pair of competitors 1,000 times with slightly different food configurations. The superior network was verified by winning a significant majority of these 1,000 competitions. Thus, the movies below are not the reason that we say one network is dominant over another. Rather, they are interesting to observe after the fact, knowing which network is dominant. Knowing this, we can see what strategy the dominant network actually uses by watching the movies. It just so happens that in the movies below, the superior network is always the winner, which is not surprising, since it is what we expect probabilistically.

Here are the movies:

Early Poor Strategies

These early networks are the champions of generations 1 and 3, respectively. The movie begins well into the round. The yellow robot has spent a long time moving towards the food extremely slowly. The movie then begins near the second piece of food. The Red robot has hardly moved at all. The yellow network runs into the red network mostly by chance after getting its food, although it does show some intelligence. (In case you wonder, the yellow network loses badly to later dominant strategies.)

Standstill (stalemate)

These champions from generations 20 and 50 have evolved to conserve energy by sitting still and letting the opponent waste energy by moving. The problem is that when two such networks meet, nothing happens. That's why this clip doesn't move.

Later Poor Strategies

The champions from population A and B in generation 40 compete in this clip. Neither bother to use food at all. The red robot wastes a lot of energy moving around to no end, while the yellow robot slowly edges in. The strategies are poor because they do not utilize the food resources.

First Succesful Strategies

Around generation 80, the first good strategy arose in population A. This competition pits the champion of generation 80 against an improved version of the same strategy by the champion of population 95. Note that both competitors here are using a sophisticated strategy with a clear switching behavior among food gathering, caution, and predation. The competition culminates in an interesting standoff where both players have similar energy levels. The robots gingerly approach each other, hoping the other one will waste more energy in the approach.

The standoff is actually quite complex. If one robot moves and the other doesn't, then the mover now has relatively less energy. However, there is no way to win if both robots just stand still, leading to an interesting dilemma. Since robots only receive credit for winning, there is great incentive to take risk. The robots tend to flinch in an attempt to get the other robot to move more, leading to an "old west"-like duel at the end.

Notice that this somewhat sophiticated strategy only requires a single hidden-node, lending support to the practice of starting evolution minimally.

Dramatic Standoff

The movie shows an "old west"-style standoff similar to the one in the previous clip, but even more cautious and drawn-out. The competitors here are champions from generations 95 and 90. The movie begins as the standoff begins, skipping the preceding food-gathering.

Later Dominant Strategy Defeats Earlier Good Strategy

A later dominant strategy arose around the 190th generation in population B. By generation 221, this new strategy dominated the world. This movie shows the champion of generation 221 playing the champion of generation 130, which is a highly optimized version of the first good strategy to arise. (The lower network, which is the red robot, is from generation 221.)

Notice the later dominant strategy is more cautious than the first, and tends to be conservative about collecting enough food to be safe. We can see the red robot change its stratgy in mid-stride at several points, deciding to get more food instead of pursue the yellow robot.

Important to the theory behind continual coevolution, the higher sophistication of the later dominant strategy utilizes a more complex structure than the inferior network.

Out of 1000 competitions from slightly different starting configurations, the later dominant won 617, the earlier strategy won 376, and the remaining competitions were draws (timed out).

Highest Dominant Strategy Defeats Prior Dominant

The highest dominant strategy arose in generation 313 in population A This new strategy appears as the lower neural network (red robot). The complexity has clearly gone up significantly. The champion of generation 210 (the prior dominant strategy) is the upper robot (yellow)

Notice that the red robot takes very clever advantage of the yellow robot's tendency to collect as much food as possible. The red robot decides to do nothing while the yellow robot rolls to the second-to-last piece of food. In a surprisingly insightful move, the red robot then dashes for the last piece of food, ensuring victory through its closer proximity. The yellow robot is caught with nowhere to go and quickly devoured.

In 1,000 different competitions between these two strategies, the highest dominant won 532, and the prior dominant won 448. The remainder were draws (timed out).

Highest Dominant Strategy Defeats First Good Strategy (no circularity)

To be sure that stragies are indeed progressing in sophistication in a total order, we need to check for circular dominance. In other words, even tough the highest dominant strategy is superior to a prior dominant, the even-earlier first good strategy might still be able to defeat the third dominant, creating a circle of dominance. Circular dominance is not what we want in competitive coevolution.

The movie shows the champion of generation 95 (yellow) playing the champion of generation 313 (red). To check for circular dominance, we test the later dominant strategy against the earlier strategy. The highest dominant appears to prevail by being more decisive than the earlier strategy, as this movie shows. The yellow robot simply finds itself in trouble before there is time to do anything about it.

In 1,000 different competitions, the highest dominant won 599, and the other won 313. The remainder were ties.

Conclusion

The experiment shows that evolving structure can contribute to increasing sophistication in coevolution. Complex goal switching behavior occurs in increasingly sophisticated strategies with more and more neural network structure. We have also tried ablating structure from complex networks and found structural ablations to be damaging, showing that structure is playing an important role.

Contact the author here:

kstanley@cs.utexas.edu