Todd Hester's Research

Research Interests

Todd I currently work at DeepMind, leading the London applied research team, where we apply state of the art machine learning to high impact products at Google. I manage a team of 12 research scientists and engineers, whose goal is to improve products by applying cutting edge research and perform research on the challenges preventing us from deploying more ML to products. I manage and mentor my team members, help set the research direction of the larger team, and lead multiple research and product projects. I've worked on areas including recommender systems, industrial controls, maps, and robotics. One public project I have worked on is our ML optimization of cooling efficiency at Google data centers.

Previously I worked at Nest Labs, where I developed learning algorithms that ran on-board the thermostat such as Auto-Schedule (learning a user's temperature schedule from dial changes), learning a thermal model of the home, and using that model plus utility rate plans to optimally plan HVAC control (e.g. learning to pre-cool the home before rates increased).

My Ph.D. research focused on enabling robots to learn and adapt on-line to new tasks using reinforcement learning (RL). Performing RL on robots poses significant challenges for existing RL algorithms; such as learning in few samples, in real-time, and handling noise and delays on sensors and actuators. My thesis presented an RL algorithm called TEXPLORE which addressed these challenges and was tested on two robot platforms. I also participated in RoboCup, the international robot soccer competition where teams program robots to play soccer autonomously. I worked on all the aspects of the robot soccer problem: computer vision, localization, ball tracking, opponent tracking, humanoid motion, and multi-robot coordination. In 2012, our team won the international RoboCup competition from a field of 25 teams.

Links:

Curriculum Vitae
Google Scholar
LinkedIn

Publications

Ph.D. Thesis

TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains.
(Dissertation PDF), (Defense Slides)

Books

T. Hester. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains (Studies in Computational Intelligence). Studies in Computational Intelligence, Springer-Verlag, 2013.

Journal Articles

T. Hester and P. Stone. Intrinsically Motivated Model Learning for Developing Curious Robots In Artificial Intelligence, 2015.

T. Hester and P. Stone. TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots. In Machine Learning, Volume 90, Issue 3, Pages 385-429, 2013. (PDF)

S. Patel, R. Hughes, T. Hester, J. Stein, M. Akay, J. Dy, and P. Bonato. A Novel Approach to Monitor Rehabilitation Outcomes in Individuals Post Stroke Using Wearable Technology. In Proceedings of the IEEE, March 2010.

S. Patel, T. Hester, R. Hughes, N. Huggins, A. Flaherty, D. Standaert, J. Growdon, and P. Bonato. Processing Wearable Sensor Data to Optimize Deep-Brain Stimulation. In IEEE Pervasive Computing, Jan 2008.

Book Chapters

T. Hester and P. Stone. The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots. In RoboCup 2013: Robot Soccer World Cup XVII, Lecture Notes in Artificial Intelligence, S. Behnke, A. Visser, R. Xiong, and M. Veloso, Editors, Springer Verlag, 2013. (PDF)

S. Barrett, K. Genter, Y. He, T. Hester, P. Khandelwal, J. Menashe, and P. Stone. UT Austin Villa 2012: Standard Platform League World Champions. In RoboCup 2012: Robot Soccer World Cup XVI, Lecture Notes in Artificial Intelligence, X. Chen, P. Stone, L. E. Sucar, and T. Van der Zant, Editors, Springer Verlag, 2012. (PDF)(RoboCup Final Video)

T. Hester and P.Stone. Learning and Using Models. In Reinforcement Learning: State-of-the-Art, M. Wiering, M. van Otterlo, Editors, Springer, 2011. (PDF)

P. Stone, M. Quinlan, and T. Hester. The Essence of Soccer: Can Robots Play Too? In Soccer and Philosophy, Open Court Publishing, 2010. (PDF)

S. Kalyanakrishnan, T. Hester, M. Quinlan, Y. Bentor, and P. Stone. Three Humanoid Soccer Platforms: Comparison and Synthesis In RoboCup 2009: Robot Soccer World Cup XIII, Lecture Notes in Artificial Intelligence, J. Baltes, M. Lagoudakis, T. Naruse, and S. S. Ghidary, Editors, pp. 140–152, Springer Verlag, 2010. (PDF)

Refereed Conferences

M. Vecerik, O. Sushkov, D. Barker, T. Rothorl, T. Hester, and J. Scholz. A practical approach to insertion with variable socket position using deep reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA), May 2019. (PDF)

T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys. Deep Q-Learning from Demonstrations. In Association for the Advancement of Artificial Intelligence (AAAI), Feb 2018. (PDF)

T. Hester, M. Lopes, and P. Stone. Learning Exploration Strategies in Model-Based Reinforcement Learning. In The Twelfth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013. (PDF)

T. Hester and P. Stone. Intrinsically Motivated Model Learning for a Developing Curious Agent. In IEEE International Conference on Development and Learning (ICDL), November, 2012. (PDF)
Paper of Excellence Award.

T. Hester, M. Quinlan, and P. Stone. RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control. In IEEE International Conference on Robotics and Automation (ICRA), May, 2012. (PDF)

T. Hester and P. Stone. Real Time Targeted Exploration in Large Domains. In IEEE International Conference on Development and Learning (ICDL), August, 2010. (PDF)

T. Hester, M. Quinlan, and P. Stone. Generalized Model Learning for Reinforcement Learning on a Humanoid Robot. In IEEE International Conference on Robotics and Automation (ICRA), May 2010. (PDF) (Video)

T. Hester and P. Stone. Generalized Model Learning for Reinforcement Learning in Factored Domains In The Eighth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2009. (PDF)

T. Hester and P. Stone. Negative Information and Line Observations for Monte Carlo Localization. In IEEE International Conference on Robotics and Automation (ICRA), May 2008. (PDF)

N. Jong, T. Hester, and P. Stone. The Utility of Temporal Abstraction in Reinforcement Learning. In The Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2008. (PDF)

P. Boissy, T. Hester, D. M. Sherrill, H. Corriveau, and P. Bonato. Monitoring Mobility Assistive Device Use in Post-Stroke Patients. In Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), August 2007.

S. Patel, T. Hester, R. Hughes, N. Huggins, D. Standaert, A. Flaherty, and P.Bonato. Using Wearable Sensors to Enhance DBS Parameter Adjustment for Parkinson's Disease Patients Through Measures of Motor Response. In Proceedings of the 3rd IEEE EMBS International Summer School and Symposium on Medical Devices and Biosensors, September 2006.

T. Hester, D. M. Sherrill, M. Hamel, K. Perreault, P. Boissy, and P. Bonato. Identification of Tasks Performed by Stroke Patients Using a Mobility Assistive Device. In Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Aug-Sept 2006.

P. Boissy, T. Hester, D. M. Sherrill, H. Corriveau, and P. Bonato. Monitoring Mobility Assistive Device Use in Patients After Stroke. In Proceedings of the 16th Congress of the International Society of Electrophysiology and Kinesiology (ISEK), June-July 2006.

T. Hester, R. Hughes, D. M. Sherrill, S. Patel, N. Huggins, A. Flaherty, D. Standaert, and P. Bonato. Adjusting DBS Settings to Optimize Parkinson’s Control Therapy. In Proceedings of the 16th Congress of the International Society of Electrophysiology and Kinesiology (ISEK), June-July 2006.

Refereed Workshop Papers

G. Dulac-Arnold, D. J. Mankowitz, and T. Hester. Challenges of real-world reinforcement learning. In ICML Workshop on Reinforcement Learning for Real Life (RLRL), June 2019. (PDF)

T. Hester and P. Stone. Intrinsically Motivated Model Learning for a Developing Curious Agent In Proceedings of the AAMAS Workshop on Adaptive Learning Agents (ALA), June 2012. (PDF)

T. Hester and P. Stone. TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots. In Proceedings of the AAAI Spring Symposium on Designing Intelligent Robots: Re-Integrating AI, March 2012.

S. Barrett, K. Genter, T. Hester, M. Quinlan, and P. Stone. Controlled Kicking under Uncertainty. In Proceedings of the 5th Humanoids Workshop on Humanoid Soccer Robots, Dec 2010. (PDF)

T. Hester and P. Stone. An Empirical Comparison of Abstraction in Models of Markov Decision Processes. In Proceedings of the ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning, June 2009. (PDF)

T. Hester, D. M. Sherrill, M. Hamel, K. Perreault, P. Boissy, and P. Bonato. Using Wearable Sensors to Analyze the Quality of Use of Mobility Assistive Devices. In Proceedings of the Third Annual International Workshop on Wearable and Implantable Body Sensor Networks (BSN), April 2006.

T. Hester, R. Hughes, D. M. Sherrill, B. Knorr, M. Akay, J. Stein, and P. Bonato. Using Wearable Sensors to Measure Motor Abilities following Stroke. In Proceedings of the Third Annual International Workshop on Wearable and Implantable Body Sensor Networks (BSN), April 2006.

S. Patel, D. Sherrill, R. Hughes, T. Hester, N. Huggins, T. Lie-Nemeth, D. Standaert, and P. Bonato. Analysis of the Severity of Dyskinesia in Patients with Parkinson’s Disease via Wearable Sensors. In Proceedings of the Third Annual International Workshop on Wearable and Implantable Body Sensor Networks (BSN), April 2006.

Patents

T. Hester, A. J. Minich, and G. A. Heitz III. Enhanced automated environmental control system scheduling using a preference function. Apr. 3 2018. US Patent 9,933,177.

T. A. Hester, E. J. Fisher, and P. Khandelwal. Predictively controlling an environmental control system. Jan. 16 2018. US Patent 9,869,484.

T. A. Hester, A. J. Minich, and G. A. Heitz III. Enhanced automated control scheduling. Sept. 26 2017. US Patent 9,772,116.

S. Y. Shafi, T. Hester, J. Ben-Meshulam, and S. R. Dey. Identification of similar users. Sept. 5 2017. US Patent 9,756,478.

M. R. Malhotra, S. Le Guen, J. A. Boyd, J. T. Lee, and T. Hester. Learned overrides for home security Dec. 13 2016. US Patent 9,520,049.

Pending Patents

M. R. Malhotra, S. Le Guen, J. A. Boyd, J. T. Lee, and T. Hester. Operating a security system. Mar. 5 2019. US Patent App. 10/223,896.

R. A. Evans, J. Gao, M. C. Ryan, G. Dulac-Arnold, J. K. Scholz, and T. A. Hester. Optimizing data center controls using neural networks. July 19 2018. US Patent App. 15/410,547.

J. Crimins, S. Ruffner, A. Minich, T. Hester, and A. Sahl. Thermostat algorithms and architecture for efficient operation at low temperatures. Apr. 12 2018. US Patent App. 15/286,564.

W. Greene, S. McGaraghan, J. Crimins, S. Ruffner, A. Minich, T. Hester, A. Sahl, and P. Subramani. Architecture for thermostat control during time-of-use intervals. Dec. 21 2017. US Patent App. 15/187,562.

J. Crimins, S. Ruffner, A. Minich, T. Hester, A. Sahl, and P. Subramani. Architecture for thermostat control during peak intervals. Dec. 21 2017. US Patent App. 15/187,565.

I. Karp, L. Stesin, C. Pi-Sunyer, M. A. McBride, A. Dubman, J. Lyons, S. W. Kortz, G. J. Hu, A. Surya, A. Thelen, et al. Methods and apparatus for using smart environment devices via application program interfaces. July 6 2017. US Patent App. 15/380,767.

P. Verhoeven and T. Hester. Coordinating energy use of disparately-controlled devices in the smart home based on near-term predicted hvac control trajectories. Apr. 13 2017. US Patent App. 14/881,807.

P. Verhoeven and T. Hester. Persistent home thermal comfort model reusable across multiple sensor and device configurations in a smart home. Feb. 23 2017. US Patent App. 14/832,675.

P. Verhoeven and T. Hester. Persistent thermal model and method of using same for automatically determining the presence of an additional thermal source other than the hvac system being controlled. Feb. 23 2017. US Patent App. 14/832,702.

P. P. Reddy, M. Malhotra, E. J. Fisher, T. Hester, M. A. McBride, and Y. Matsuoka. Intelligent configuration of a smart environment based on arrival time. Dec. 24 2015. US Patent App. 14/531,805.

Technical Reports

D. J. Mankowitz, N. Levine, R. Jeong, A. Abdolmaleki, J. T. Springenberg, T. A. Mann, T. Hester, and M. A. Riedmiller. Robust reinforcement learning for continuous control with model misspecification. arXiv, vol. abs/1906.07516, 2019. (PDF)

T. Pohlen, B. Piot, T. Hester, M. G. Azar, D. Horgan, D. Budden, G. Barth-Maron, H. van Hasselt, J. Quan, M. Veceŕık, M. Hessel, R. Munos, and O. Pietquin. Observe and look further: Achieving consistent performance on atari. arXiv, vol. abs/1805.11593, 2018. (PDF)

G. Dalal, K. Dvijotham, M. Vecer ık, T. Hester, C. Paduraru, and Y. Tassa. Safe exploration in continuous action spaces. arXiv, vol. abs/1801.08757, 2018. (PDF)

M. Veceŕık, T. Hester, J. Scholz, F. Wang, O. Pietquin, B. Piot, N. Heess T. Rothorl, T. Lampe, and M. A. Riedmiller. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv, vol. abs/1707.08817, 2017. (PDF)

T. A. Mann, H. Penedones, S. Mannor, and T. Hester. Adaptive lambda least-squares temporal difference learning. arXiv, vol. abs/1612.09465, 2016. (PDF)

Nest Labs, Thermal model and hvac control white paper. Nov 2015. (PDF)

Nest Labs, Enhanced auto-schedule. Nov 2014. (PDF)

T. Hester. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Ph.D. Thesis, The University of Texas at Austin, Department of Computer Science, AI Laboratory, December 2012. (PDF)(Slides)

S. Barrett, K. Genter, T. Hester, P. Khandelwal, M. Quinlan, P. Stone, and M. Sridharan. Austin Villa 2011: Sharing is Caring: Better Awareness through Information Sharing. Technical Report UT-AI-TR-12-01, The University of Texas at Austin, Department of Computer Science, AI Laboratory, 2012. (PDF)

S. Barrett, K. Genter, M. Hausknecht, T. Hester, P. Khandelwal, J. Lee, M. Quinlan, A. Tian, P. Stone, and M. Sridharan. Austin Villa 2010 Standard Platform Team Report. Technical Report UT-AI-TR-11-01, The University of Texas at Austin, Department of Computer Science, AI Laboratory, 2011. (PDF)

T. Hester, M. Quinlan, P. Stone, and M. Sridharan. UT Austin Villa 2009: Naos Across Texas. Technical Report UT-AI-TR-09-08, The University of Texas at Austin, Department of Computer Science, AI Laboratory, 2009. (PDF)

T. Hester, M. Quinlan, and P. Stone. UT Austin Villa 2008: Standing on Two Legs. Technical Report UT-AI-TR-08-8, The University of Texas at Austin, Department of Computer Science, AI Laboratory, 2008. (PDF)

Invited Talks

A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control. At The 6th Barbados Workshop on Reinforcement Learning, March 2011.

Generalized Model Learning for Reinforcement Learning in Factored Domains. At The 4th Barbados Workshop on Reinforcement Learning, April 2009.

RoboCup Results

2012 Standard Platform League WORLD CHAMPIONS
2012 Standard Platform League US Open Champions
2010 Standard Platform League 3rd Place International Competition
2010 Standard Platform League US Open Champions
2009 Standard Platform League US Open Champions

Videos

A Practical Approach to Insertion Using Deep Reinforcement Learning

Demonstration of our agent which learns from demonstrations (DDPGfD), learning to insert deformable objects.

DQfD playing Montezuma's Revenge

Video of the DQfD agent, which combined Deep Q-Learning with learning from demonstrations, playing Montezuma's Revenge.

DQfD playing Pitfall

Video of the DQfD agent, which combined Deep Q-Learning with learning from demonstrations, playing Pitfall.

DQfD playing Hero

Video of the DQfD agent, which combined Deep Q-Learning with learning from demonstrations, playing Hero.

2012 RoboCup Final: Austin Villa vs. B-Human

Video of our 2012 SPL final against B-Human, who had won the last 3 years and had never lost a game.

2012 RoboCup Semi-Final: Austin Villa vs. rUNSWift

Video of our 2012 SPL semi-final against rUNSWift. This was a very exciting game. We spent the entire game tied, down 1, or down 2, until taking a lead with 1:30 left and holding on to win.

2010 RoboCup Highlights

Highlights of TT-UT Austin Villa at the 2010 RoboCup Standard Platform League competition in Singapore, where the team took 3rd place.

Learning to Score Penalty Kicks via Reinforcement Learning

The accompanying video for our ICRA 2010 paper, where we learn to score penalty kicks via a novel model-based reinforcement learning method.

2009 RoboCup Highlights

Highlights of TT-UT Austin Villa at the 2009 RoboCup Standard Platform League. TT-UT Austin Villa finished in 4th place, losing to only two teams during the tournament.

2009 US Open Highlights

Highlights of TT-UT Austin Villa at the 2009 US Open. TT-UT Austin Villa won the 2009 US Open with a finals win over UPenn (1-1 tie, 3-2 in penalty kicks).

Aibo Highlights

This video shows highlights (both shots and saves) from demonstrations held during Explore UT on March 7, 2009.

Teaching

In the spring 2013 semester, I instructed the CS 378: Autonomous Intelligent Robotics (FRI) course.

In the Fall 2012 semester, I instructed the CS 344M: Autonomous Multiagent Systems course.

In the Spring 2012 semester, I was the TA for CS378: Autonomous Vehicles in Traffic I. This course is part of the freshman research initiative (FRI).

In the Fall 2009 semester, I was the TA for CS393R: Autonomous Robotics. I won the department's Outstanding TA Award.

In Spring 2009, I was a TA for CS307 Foundations of Computing..

Open Source Code

I have a released a package (rl-texplore-ros-pkg) of reinforcement learning code for ROS. It contains a set of RL agents and environments, as well as a formalism for them to communicate through the use of ROS messages. In particular, the set of RL agents includes an implementation of our TEXPLORE agent (See our ICDL paper) and our real-time architecture for model-based agents (See our ICRA paper). A common interface is defined for agents, environments, models, and planners. Therefore, it should be easy to add new agents, or add new model learning or planning methods to the existing general model based agent. The real-time architecture should work with any model learning method that fits the defined interface. In addition, since the RL agents communicate using ROS messages, it is easy to integrate them with robots using an existing ROS architecture to perform reinforcement learning on robots.