Todd Hester's Research

Research Interests

Todd I currently work at DeepMind, leading the London applied research team, where we apply state of the art machine learning to high impact products at Google. I manage a team of 12 research scientists and engineers, whose goal is to improve products by applying cutting edge research and perform research on the challenges preventing us from deploying more ML to products. I manage and mentor my team members, help set the research direction of the larger team, and lead multiple research and product projects. I've worked on areas including recommender systems, industrial controls, maps, and robotics. One public project I have worked on is our ML optimization of cooling efficiency at Google data centers.

Previously I worked at Nest Labs, where I developed learning algorithms that ran on-board the thermostat such as Auto-Schedule (learning a user's temperature schedule from dial changes), learning a thermal model of the home, and using that model plus utility rate plans to optimally plan HVAC control (e.g. learning to pre-cool the home before rates increased).

My Ph.D. research focused on enabling robots to learn and adapt on-line to new tasks using reinforcement learning (RL). Performing RL on robots poses significant challenges for existing RL algorithms; such as learning in few samples, in real-time, and handling noise and delays on sensors and actuators. My thesis presented an RL algorithm called TEXPLORE which addressed these challenges and was tested on two robot platforms. I also participated in RoboCup, the international robot soccer competition where teams program robots to play soccer autonomously. I worked on all the aspects of the robot soccer problem: computer vision, localization, ball tracking, opponent tracking, humanoid motion, and multi-robot coordination. In 2012, our team won the international RoboCup competition from a field of 25 teams.

Links:

Curriculum Vitae
Google Scholar
LinkedIn

Publications

Ph.D. Thesis

Books

Journal Articles

Book Chapters

  • S. Kalyanakrishnan, T. Hester, M. Quinlan, Y. Bentor, and P. Stone. Three Humanoid Soccer Platforms: Comparison and Synthesis In RoboCup 2009: Robot Soccer World Cup XIII, Lecture Notes in Artificial Intelligence, J. Baltes, M. Lagoudakis, T. Naruse, and S. S. Ghidary, Editors, pp. 140–152, Springer Verlag, 2010. (PDF)

Refereed Conferences

  • M. Vecerik, O. Sushkov, D. Barker, T. Rothorl, T. Hester, and J. Scholz. A practical approach to insertion with variable socket position using deep reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA), May 2019. (PDF)
  • T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys. Deep Q-Learning from Demonstrations. In Association for the Advancement of Artificial Intelligence (AAAI), Feb 2018. (PDF)
  • P. Boissy, T. Hester, D. M. Sherrill, H. Corriveau, and P. Bonato. Monitoring Mobility Assistive Device Use in Patients After Stroke. In Proceedings of the 16th Congress of the International Society of Electrophysiology and Kinesiology (ISEK), June-July 2006.
  • T. Hester, R. Hughes, D. M. Sherrill, S. Patel, N. Huggins, A. Flaherty, D. Standaert, and P. Bonato. Adjusting DBS Settings to Optimize Parkinson’s Control Therapy. In Proceedings of the 16th Congress of the International Society of Electrophysiology and Kinesiology (ISEK), June-July 2006.

Refereed Workshop Papers

  • G. Dulac-Arnold, D. J. Mankowitz, and T. Hester. Challenges of real-world reinforcement learning. In ICML Workshop on Reinforcement Learning for Real Life (RLRL), June 2019. (PDF)
  • T. Hester and P. Stone. TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots. In Proceedings of the AAAI Spring Symposium on Designing Intelligent Robots: Re-Integrating AI, March 2012.

Patents

  • T. Hester, A. J. Minich, and G. A. Heitz III. Enhanced automated environmental control system scheduling using a preference function. Apr. 3 2018. US Patent 9,933,177.
  • T. A. Hester, E. J. Fisher, and P. Khandelwal. Predictively controlling an environmental control system. Jan. 16 2018. US Patent 9,869,484.
  • T. A. Hester, A. J. Minich, and G. A. Heitz III. Enhanced automated control scheduling. Sept. 26 2017. US Patent 9,772,116.
  • S. Y. Shafi, T. Hester, J. Ben-Meshulam, and S. R. Dey. Identification of similar users. Sept. 5 2017. US Patent 9,756,478.
  • M. R. Malhotra, S. Le Guen, J. A. Boyd, J. T. Lee, and T. Hester. Learned overrides for home security Dec. 13 2016. US Patent 9,520,049.

Pending Patents

  • M. R. Malhotra, S. Le Guen, J. A. Boyd, J. T. Lee, and T. Hester. Operating a security system. Mar. 5 2019. US Patent App. 10/223,896.
  • R. A. Evans, J. Gao, M. C. Ryan, G. Dulac-Arnold, J. K. Scholz, and T. A. Hester. Optimizing data center controls using neural networks. July 19 2018. US Patent App. 15/410,547.
  • J. Crimins, S. Ruffner, A. Minich, T. Hester, and A. Sahl. Thermostat algorithms and architecture for efficient operation at low temperatures. Apr. 12 2018. US Patent App. 15/286,564.
  • W. Greene, S. McGaraghan, J. Crimins, S. Ruffner, A. Minich, T. Hester, A. Sahl, and P. Subramani. Architecture for thermostat control during time-of-use intervals. Dec. 21 2017. US Patent App. 15/187,562.
  • J. Crimins, S. Ruffner, A. Minich, T. Hester, A. Sahl, and P. Subramani. Architecture for thermostat control during peak intervals. Dec. 21 2017. US Patent App. 15/187,565.
  • I. Karp, L. Stesin, C. Pi-Sunyer, M. A. McBride, A. Dubman, J. Lyons, S. W. Kortz, G. J. Hu, A. Surya, A. Thelen, et al. Methods and apparatus for using smart environment devices via application program interfaces. July 6 2017. US Patent App. 15/380,767.
  • P. Verhoeven and T. Hester. Coordinating energy use of disparately-controlled devices in the smart home based on near-term predicted hvac control trajectories. Apr. 13 2017. US Patent App. 14/881,807.
  • P. Verhoeven and T. Hester. Persistent home thermal comfort model reusable across multiple sensor and device configurations in a smart home. Feb. 23 2017. US Patent App. 14/832,675.
  • P. Verhoeven and T. Hester. Persistent thermal model and method of using same for automatically determining the presence of an additional thermal source other than the hvac system being controlled. Feb. 23 2017. US Patent App. 14/832,702.
  • P. P. Reddy, M. Malhotra, E. J. Fisher, T. Hester, M. A. McBride, and Y. Matsuoka. Intelligent configuration of a smart environment based on arrival time. Dec. 24 2015. US Patent App. 14/531,805.

Technical Reports

  • D. J. Mankowitz, N. Levine, R. Jeong, A. Abdolmaleki, J. T. Springenberg, T. A. Mann, T. Hester, and M. A. Riedmiller. Robust reinforcement learning for continuous control with model misspecification. arXiv, vol. abs/1906.07516, 2019. (PDF)
  • T. Pohlen, B. Piot, T. Hester, M. G. Azar, D. Horgan, D. Budden, G. Barth-Maron, H. van Hasselt, J. Quan, M. Veceŕık, M. Hessel, R. Munos, and O. Pietquin. Observe and look further: Achieving consistent performance on atari. arXiv, vol. abs/1805.11593, 2018. (PDF)
  • G. Dalal, K. Dvijotham, M. Vecer ık, T. Hester, C. Paduraru, and Y. Tassa. Safe exploration in continuous action spaces. arXiv, vol. abs/1801.08757, 2018. (PDF)
  • M. Veceŕık, T. Hester, J. Scholz, F. Wang, O. Pietquin, B. Piot, N. Heess T. Rothorl, T. Lampe, and M. A. Riedmiller. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv, vol. abs/1707.08817, 2017. (PDF)
  • T. A. Mann, H. Penedones, S. Mannor, and T. Hester. Adaptive lambda least-squares temporal difference learning. arXiv, vol. abs/1612.09465, 2016. (PDF)
  • Nest Labs, Thermal model and hvac control white paper. Nov 2015. (PDF)
  • Nest Labs, Enhanced auto-schedule. Nov 2014. (PDF)
  • T. Hester. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Ph.D. Thesis, The University of Texas at Austin, Department of Computer Science, AI Laboratory, December 2012. (PDF)(Slides)
  • S. Barrett, K. Genter, M. Hausknecht, T. Hester, P. Khandelwal, J. Lee, M. Quinlan, A. Tian, P. Stone, and M. Sridharan. Austin Villa 2010 Standard Platform Team Report. Technical Report UT-AI-TR-11-01, The University of Texas at Austin, Department of Computer Science, AI Laboratory, 2011. (PDF)
  • T. Hester, M. Quinlan, P. Stone, and M. Sridharan. UT Austin Villa 2009: Naos Across Texas. Technical Report UT-AI-TR-09-08, The University of Texas at Austin, Department of Computer Science, AI Laboratory, 2009. (PDF)

Invited Talks

  • A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control. At The 6th Barbados Workshop on Reinforcement Learning, March 2011.
  • Generalized Model Learning for Reinforcement Learning in Factored Domains. At The 4th Barbados Workshop on Reinforcement Learning, April 2009.

RoboCup Results

  • 2012 Standard Platform League WORLD CHAMPIONS
  • 2012 Standard Platform League US Open Champions
  • 2010 Standard Platform League 3rd Place International Competition
  • 2010 Standard Platform League US Open Champions
  • 2009 Standard Platform League US Open Champions

Videos

A Practical Approach to Insertion Using Deep Reinforcement Learning

Demonstration of our agent which learns from demonstrations (DDPGfD), learning to insert deformable objects.

DQfD playing Montezuma's Revenge

Video of the DQfD agent, which combined Deep Q-Learning with learning from demonstrations, playing Montezuma's Revenge.

DQfD playing Pitfall

Video of the DQfD agent, which combined Deep Q-Learning with learning from demonstrations, playing Pitfall.

DQfD playing Hero

Video of the DQfD agent, which combined Deep Q-Learning with learning from demonstrations, playing Hero.

2012 RoboCup Final: Austin Villa vs. B-Human

Video of our 2012 SPL final against B-Human, who had won the last 3 years and had never lost a game.

2012 RoboCup Semi-Final: Austin Villa vs. rUNSWift

Video of our 2012 SPL semi-final against rUNSWift. This was a very exciting game. We spent the entire game tied, down 1, or down 2, until taking a lead with 1:30 left and holding on to win.

2010 RoboCup Highlights

Highlights of TT-UT Austin Villa at the 2010 RoboCup Standard Platform League competition in Singapore, where the team took 3rd place.

Learning to Score Penalty Kicks via Reinforcement Learning

The accompanying video for our ICRA 2010 paper, where we learn to score penalty kicks via a novel model-based reinforcement learning method.

2009 RoboCup Highlights

Highlights of TT-UT Austin Villa at the 2009 RoboCup Standard Platform League. TT-UT Austin Villa finished in 4th place, losing to only two teams during the tournament.

2009 US Open Highlights

Highlights of TT-UT Austin Villa at the 2009 US Open. TT-UT Austin Villa won the 2009 US Open with a finals win over UPenn (1-1 tie, 3-2 in penalty kicks).

Aibo Highlights

This video shows highlights (both shots and saves) from demonstrations held during Explore UT on March 7, 2009.

Teaching

In the spring 2013 semester, I instructed the CS 378: Autonomous Intelligent Robotics (FRI) course.

In the Fall 2012 semester, I instructed the CS 344M: Autonomous Multiagent Systems course.

In the Spring 2012 semester, I was the TA for CS378: Autonomous Vehicles in Traffic I. This course is part of the freshman research initiative (FRI).

In the Fall 2009 semester, I was the TA for CS393R: Autonomous Robotics. I won the department's Outstanding TA Award.

In Spring 2009, I was a TA for CS307 Foundations of Computing..

Open Source Code

I have a released a package (rl-texplore-ros-pkg) of reinforcement learning code for ROS. It contains a set of RL agents and environments, as well as a formalism for them to communicate through the use of ROS messages. In particular, the set of RL agents includes an implementation of our TEXPLORE agent (See our ICDL paper) and our real-time architecture for model-based agents (See our ICRA paper). A common interface is defined for agents, environments, models, and planners. Therefore, it should be easy to add new agents, or add new model learning or planning methods to the existing general model based agent. The real-time architecture should work with any model learning method that fits the defined interface. In addition, since the RL agents communicate using ROS messages, it is easy to integrate them with robots using an existing ROS architecture to perform reinforcement learning on robots.