Reinforcement Learning

(CS 394R)

Request Info

This course introduces the theory and practice of modern reinforcement learning. Reinforcement learning problems involve learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The course will cover model-free and model-based reinforcement learning methods, especially those based on temporal difference learning and policy gradient algorithms. Introduces the theory and practice of modern reinforcement learning. Reinforcement learning problems involve learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The course will cover model-free and model-based reinforcement learning methods, especially those based on temporal difference learning and policy gradient algorithms. It covers the essentials of reinforcement learning (RL) theory and how to apply it to real-world sequential decision problems. Reinforcement learning is an essential part of fields ranging from modern robotics to game-playing (e.g. Poker, Go, and Starcraft). The material covered in this class will provide an understanding of the core fundamentals of reinforcement learning, preparing students to apply it to problems of their choosing, as well as allowing them to understand modern RL research. Professors Peter Stone and Scott Niekum are active reinforcement learning researchers and bring their expertise and excitement for RL to the class.

What You Will Learn

  • Fundamental reinforcement learning theory and how to apply it to real-world problems
  • Techniques for evaluating policies and learning optimal policies in sequential decision problems
  • The differences and tradeoffs between value function, policy search, and actor-critic methods in reinforcement learning
  • When and how to apply model-based vs. model-free learning methods
  • Approaches for balancing exploration and exploitation during learning
  • How to learn from both on-policy and off-policy data

Syllabus

  • Multi-Armed Bandits
  • Finite Markov Decision Processes
  • Dynamic Programming
  • Monte Carlo Methods
  • Temporal-Difference Learning
  • n-step Bootstrapping
  • Planning and Learning
  • On-Policy Prediction with Approximation
  • On-Policy Control with Approximation
  • Off-Policy Methods with Approximation
  • Eligibility Traces
  • Policy Gradient Methods

Course Category

Applications Course

Course Availability

Spring 2020

Meet Your Instructors

Take the Next Step

Advance your computer science career with UT Austin's Master of Computer Science Online.