UTCS Colloquia/AI - Bruno Castro da Silva, "Learning Parameterized Motor Skills"

Contact Name: 
Karl Pichotta
GDC 3.516
Apr 22, 2014 11:00am - 12:30pm

Signup Schedule: https://apps.cs.utexas.edu/talkschedules/cgi/list_events.cgi

Speaker: Bruno Castro da Silva/University of Massachusetts Amherst

UTCS Host: Peter Stone

Audience: UTCS Faculty, Graduate Students, Undergraduate Students, and Outside Interested Parties


Flexible skills are one of the fundamental building blocks required to design truly autonomous robots. When solving control problems, single policies can be learned but may fail if the tasks at hand vary or if the agent has to face novel, unknown contexts. Learning a single policy for each possible variation of a task or context is often infeasible. To address this problem we introduce a general framework for learning reusable, parameterized skills.

Parameterized skills are flexible behaviors that can produce---on demand---policies for any tasks drawn from a distribution of related control problems. Once acquired, they can be used to solve novel variations of a task, even those which the agent has never had direct experience with. They also allow for the construction of hierarchically structured policies and help the agent abstract away details of lower level controllers.

Previous work has shown that it is possible to transfer information between pairs of related control tasks and that parameterized policies can be constructed to deal with slight variations of a known domain. However, limited attention has been given to methods that allow an agent to autonomously synthesize general, parameterized skills from very few training samples.

We identify and solve three problems required to autonomously and efficiently learn parameterized skills. First, an agent observing or learning just a small number of task instances needs to be able to generalize such experiences and synthesize a single general, flexible skill. The skill should be capable of producing appropriate behaviors when invoked in novel contexts or applied to yet unseen variations of a task. Secondly, the agent must realize if suboptimal policies experienced while learning some task instance may nonetheless be useful for solving different---but possibly related---tasks this allows seemingly unsuccessful policies to be used as additional training samples, thus accelerating the construction of the skill. Lastly, the agent must be capable of actively selecting which tasks it wishes to train next in order to more rapidly become competent in the skill. We evaluate our methods on a physical humanoid robot tasked with autonomously constructing a whole-body parameterized throwing skill from limited data.


Bruno Castro da Silva is a Ph.D. candidate at the University of Massachusetts, working under the supervision of Prof. Andrew Barto. He received his B.S. in Computer Science from the Federal University of Rio Grande do Sul (Brazil) in 2004, and his MSc. from the same university in 2007. Bruno has worked, in different occasions from 2011 to 2013, as a visiting researcher at the Laboratory of Computational Embodied Neuroscience, in Rome, Italy, developing novel control algorithms for the iCub robot. His research interests lie in the intersection of machine learning, optimal control theory and robotics, and include the construction of reusable motor skills, active learning, efficient exploration of large state-spaces and Bayesian optimization applied to control.