Two Methods for Hierarchy Learning in Reinforcement Environments, in From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior (SAB '92), 1992.

This paper describes two methods for hierarchically organizing temporal behaviors.  The first is more intuitive: grouping together common sequences of events into single units so that they may be treated as individual behaviors.  This system immediately encounters problems, however, because the units are binary, meaning the behaviors must execute completely or not at all, and this hinders the construction of good training algorithms.  The system also runs into difficulty when more than one unit is (or should be) active at the same time.  The second system is a hierarchy of transition values. This hierarchy dynamically modifies the values that specify the degree to which one unit should follow another.  These values are continuous, allowing the use of gradient descent during learning.  Furthermore, many units are active at the same time as part of the system's normal functionings.