UTCS AI Colloquia - Kevin Murphy, Research Scientist, Google, "From Big Data to Big Knowledge"

Contact Name: 
Karl Pichotta
GDC 6.302
Nov 22, 2013 11:00am - 12:00pm
Dana Ballard

Signup Schedule: http://apps.cs.utexas.edu/talkschedules/cgi/list_events.cgi

Talk Audience: UTCS Faculty, Grads, Undergrads, Other Interested Parties

Host: Dana Ballard

Talk Abstract: We are drowning in big data, but a lot of it is hard to interpret. For example, Google indexes about 40B webpages, but these are just represented as bags of words, which don't mean much to a computer. To get from "strings to things", Google introduced the Knowledge Graph (KG), which is a database of facts about entities (people, places, movies, etc.) and their relations (nationality, geo-containment, actor roles, etc). KG is based on Freebase, but supplements it with various other structured data sources. Although KG is very large (about 500M nodes/ entities, and 30B edges/ relations), it is still very incomplete. For example, 94\% of the people are missing their place of birth, and 78\% have no known nationality - these are examples of missing links in the graph. In addition, we are missing many nodes (corresponding to new entities), as well as new {\em types} of nodes and edges (corresponding to extensions to the schema). In this talk, I will survey some of the efforts we are engaged in to try to "grow" KG automatically using machine learning methods. In particular, I will summarize our work on the problems of entity linkage, relation extraction, and link prediction, using data extracted from natural language text as well as tabular data found on the web.

Speaker Bio: Kevin Murphy is a research scientist at Google in Mountain View, California, where is working on information extraction and probabilistic knowledge bases. Before joining Google in 2011, he was an associate professor of computer science and statistics at the University of British Columbia in Vancouver, Canada. Before starting at UBC in 2004, he was a postdoc at MIT. Kevin got his BA from U. Cambridge, his MEng from U. Pennsylvania, and his PhD from UC Berkeley. He has published over 50 papers in refereed conferences and journals related to machine learning and graphical models, as well as an 1100-page textbook called "Machine Learning: a Probabilistic Perspective" (MIT Press, 2012), which is currently the best selling machine learning book on Amazon.com. Kevin is also the (co) Editor-in-Chief of the Journal of Machine Learning Research.