I work in a branch of machine learning called reinforcement learning, which focuses on how to choose actions optimally over time, even when you don't know a priori what those actions even do. Before each action you can observe the current state of the system, and after executing a chosen action you observe the resulting state and any immediate rewards. This problem seems incredibly broad, but in some sense it was solved decades ago, by making the straightforward assumption that the actions' effects depend only on the observed state at the time. However, although we have algorithms that can solve any such problem in the limit, they require too much data to be useful in practice.
My work focuses on improving the efficiency of reinforcement learning algorithms by improving how they generalize from limited data to possibly infinite problems. Most current algorithms employ a single, monolithic representation of knowledge called the value function, which estimates the long-term reward possible from each state of the system. Once you've learned the value function, it's straightforward to select actions optimally, but to me the value function seems the wrong level of abstraction. Learning a value function directly from experience data is like trying to write a computer program directly in assembly language. I'm trying to find the right analogue of a high-level programming language, which can be used to articulate how to generalize efficiently from data.
So far, I've assembled into one unified "language of generalization" some ideas for improving reinforcement learning that were previously studied in isolation: learning models of the system dynamics (instead of just state and action values), function approximation methods for learning in infinite and continuous systems, and hierarchical methods for learning values of high-level actions that recursively invoke lower-level actions. The result is a way of learning hierarchies of abstract models that generalizes from data more efficiently than current methods.