This paper describes and evaluates how confidence values can be used to improve the quality of exploration in Q-Routing for adaptive packet routing in communication networks. In Q-Routing each node in the network has a routing decision maker that adapts, on-line, to learn routing policies that can sustain high network loads and have low average packet delivery time. These decision makers maintain their view of the network in terms of Q values which are updated as the routing takes place. In Confidence based Q-Routing (CQ-Routing), the improved implementation of Q-Routing with confidence values, each Q value is attached with a confidence (C value) which is a measure of how closely the corresponding Q value represents the current state of the network. While the learning rate in Q-Routing is a constant, the learning rate in CQ-Routing is computed as a function of confidence values of the old and estimated Q values for each update. If either the old Q value has a low confidence or the estimated Q value has a high confidence, the learning rate is high. The quality of exploration is improved in CQ-Routing as a result of this variable learning rate. Experiments over several network topologies have shown that at low and medium loads, CQ-Routing learns the adequate routing policies significantly faster than Q-Routing, and at high loads, it learns routing policies that are significantly better than those learned by Q-Routing in terms of average packet delivery time. CQ-Routing is able to sustain higher network loads than Q-Routing, non-adaptive shortest-path routing and adaptive Bellman-Ford Routing. Finally, CQ-Routing was found to adapt significantly faster than Q-Routing to changes in network topology.
Smart Engineering Systems: Neural Networks, Fuzzy Logic, Data Mining, and Evolutionary ProgrammingC. H. Dagli and M. Akay and O. Ersoy and B. R. Fernandez and A. Smith (Eds.), Vol. 8 (1998).

Shailesh Kumar Masters Alumni
Risto Miikkulainen Faculty risto [at] cs utexas edu