Adaptive Packet Routing: The Confidence-Based Dual Reinforcement Q-Learning Algorithm
Active from 1998 - 2000
Standard reinforcement learning (TD or Q learning) is based on forward exploration: later estimates are used to update earlier ones. In Dual Reinforcement Learning, backward exploration is also utilized: earlier estimates are used to update later estimates. The quality of estimates can be further improved by keeping track of how recently they were updated. In this project, these ideas are applied to the Q-routing algorithm for adaptive packet routing in communication networks, improving the speed of learning and the quality of the final routing policy.