These several movies demonstrate the usefulness of fitness-based shaping techniques in evolving agent behavior for multiobjective domains. Using fitness-based shaping in evolution means changing selection pressures in order to encourage the long-term development of successful agent behavior. Selection is typically based only on fitness functions directly related to the task, but such an approach can easily get stuck in suboptimal regions of the fitness landscape. Modifying selection via fitness-based shaping may mean that some solutions are preserved that would otherwise be discarded due to having lower fitness, but the point of fitness-based shaping is that keeping these individuals in the population will eventually lead to better solutions in the long term.
Two fitness-based shaping methods are evaluated in the Battle Domain.
Monster agents are evolved against a bot that attacks the monsters with a bat.
The Battle Domain consists of three objectives for the evolved monsters: dealing damage to the bat-wielding bot, avoiding damage from the bat, and staying alive for as long as possible.
Movies of evolved behaviors are shown below.
One approach to fitness-based shaping is to add an explicit diversity objective to the objective set. This extra objective is no problem because the Multiobjective Evolutionary Algorithm NSGA-II is already being used to evolve in the presence of multiple objectives. This extra diversity objective rewards individuals in the population that behave in a manner significantly different from other members of the population. The resulting populations will contain some poorly performing individuals, but the population will do a better job of exploring different possible behaviors, which includes good behaviors as well. Some examples are shown in this video:
Targeting Unachieved Goals
When shaping is not used, evolved agents will initially focus on the objectives that are easier to perform well in. Sometimes this biases the population so strongly that it is hard for agents that also do well in the other objectives to emerge. TUG gets around this problem by dynamically deactivating objectives from the selection process when they are not needed. Numeric goal values are set for each objective. If individuals in the population tend to surpass the goal value for a given objective, then that objective will be deactivated to give the population a chance to do well in the other objectives. If performance then degrades, the objectives are switched back on, but the population will tend to be filled with agents that can maintain reasonable behavior in objectives that are switched off. Ultimately, turning objectives on and off helps evolution to quickly perform well in all objectives. Furthermore, the goal values increase as they are surpassed, until they converge to the upper limits of achievable performance in the domain.
TUG can also learn great behavior even when the initial goal values are do not incorporate expert knowledge. For the experiments below, the initial TUG values were the minimal scores in each objective, but because TUG adjusts the goals to suite the performance of the population, they gradually rise and the population performs very well.
The following movies show the types of evolved behaviors that are more common when using NSGA-II without any sort of fitness-based shaping.
Further details about the shaping methods used, and the domain depicted above are available in the publication below.