Statistical Relational Learning (SRL), studies techniques that combine the
strengths of relational learning (e.g. inductive logic programming) and probabilistic learning
(e.g. Bayesian networks). By combining the power of logic and probability,
such systems can perform robust and accurate reasoning and learning about
complex relational data. See the book: Introduction to Statistical Relational
Learning. Our work in the area has primarily focused on applications of
SRL methods to problems in natural language processing, transfer learning, and abductive reasoning.

- Probabilistic Soft Logic for Semantic Textual Similarity

[Details] [PDF]

Islam Beltagy and Katrin Erk and Raymond J. Mooney

To Appear In*Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14)*, Baltimore, MD, 2014.Probabilistic Soft Logic (PSL) is a recently developed framework for probabilistic logic. We use PSL to combine logical and distributional representations of natural-language meaning, where distributional information is represented in the form of weighted inference rules. We apply this framework to the task of Semantic Textual Similarity (STS) (i.e. judging the semantic similarity of natural-language sentences), and show that PSL gives improved results compared to a previous approach based on Markov Logic Networks (MLNs) and a purely distributional approach.

ML ID: 300

- Plan Recognition Using Statistical Relational Models

[Details] [PDF]

Sindhu Raghavan and Parag Singla and Raymond J. Mooney

In Sukthankar, G. and Geib, C. and Bui, H.H. and Pynadath, D. and Goldman, R.P., editors,*Plan, Activity, and Intent Recognition: Theory and Practice*, 57--85, Burlington, MA, 2014. Morgan Kaufmann.Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring plans that best explain observed actions. Most existing approaches to plan recognition and other abductive reasoning tasks either use first-order logic (or subsets of it) or probabilistic graphical models. While the former cannot handle uncertainty in the data, the latter cannot handle structured representations. To overcome these limitations, we explore the application of statistical relational models that combine the strengths of both first-order logic and probabilistic graphical models to plan recognition. Specifically, we introduce two new approaches to abductive plan recognition using Bayesian Logic Programs (BLPs) and Markov Logic Networks (MLNs). Neither of these formalisms is suited for abductive reasoning because of the deductive nature of the underlying logical inference. In this work, we propose approaches to adapt both these formalisms for abductive plan recognition. We present an extensive evaluation of our approaches on three benchmark datasets on plan recognition, comparing them with existing state-of-the-art methods.

ML ID: 298

- Online Inference-Rule Learning from Natural-Language Extractions

[Details] [PDF] [Poster]

Sindhu Raghavan and Raymond J. Mooney

In*Proceedings of the 3rd Statistical Relational AI (StaRAI-13) workshop at AAAI '13*, July 2013.In this paper, we consider the problem of learning commonsense knowledge in the form of first-order rules from incomplete and noisy natural-language extractions produced by an off-the-shelf information extraction (IE) system. Much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. The proposed rule learner accounts for this phenomenon by learning rules in which the body of the rule contains relations that are usually explicitly stated, while the head employs a less-frequently mentioned relation that is easily inferred. The rule learner processes training examples in an online manner to allow it to scale to large text corpora. Furthermore, we propose a novel approach to weighting rules using a curated lexical ontology like WordNet. The learned rules along with their parameters are then used to infer implicit information using a Bayesian Logic Program. Experimental evaluation on a machine reading testbed demonstrates the efficacy of the proposed methods.

ML ID: 287

- Montague Meets Markov: Deep Semantics with Probabilistic Logical Form

[Details] [PDF] [Slides]

Islam Beltagy, Cuong Chau, Gemma Boleda, Dan Garrette, Katrin Erk, Raymond Mooney

In*Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*Sem-2013)*, 11--21, Atlanta, GA, June 2013.We combine logical and distributional representations of natural language meaning by transforming distributional similarity judgments into weighted inference rules using Markov Logic Networks (MLNs). We show that this framework supports both judging sentence similarity and recognizing textual entailment by appropriately adapting the MLN implementation of logical connectives. We also show that distributional phrase similarity, used as textual inference rules created on the fly, improves its performance.

ML ID: 285

- A Formal Approach to Linking Logical Form and Vector-Space Lexical Semantics

[Details] [PDF]

Dan Garrette, Katrin Erk, Raymond J. Mooney

In Harry Bunt, Johan Bos, and Stephen Pulman, editors,*Computing Meaning*, 27--48, Berlin, 2013. Springer.First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, weighted knowledge, for example regarding word meaning. This paper describes a mapping between predicates of logical form and points in a vector space. This mapping is then used to project distributional inferences to inference rules in logical form. We then describe first steps of an approach that uses this mapping to recast first-order semantics into the probabilistic models that are part of Statistical Relational AI. Specifically, we show how Discourse Representation Structures can be combined with distributional models for word meaning inside a Markov Logic Network and used to successfully perform inferences that take advantage of logical concepts such as negation and factivity as well as weighted information on word meaning in context.

ML ID: 284

- Bayesian Logic Programs for Plan Recognition and Machine Reading

[Details] [PDF] [Slides]

Sindhu Raghavan

PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2012. 170.Several real world tasks involve data that is uncertain and relational in nature. Traditional approaches like first-order logic and probabilistic models either deal with structured data or uncertainty, but not both. To address these limitations, statistical relational learning (SRL), a new area in machine learning integrating both first-order logic and probabilistic graphical models, has emerged in the recent past. The advantage of SRL models is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian networks are a powerful SRL formalism developed in the recent past. In this dissertation, we develop approaches using BLPs to solve two real world tasks -- plan recognition and machine reading.

Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the dissertation, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for abductive plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs).

In the second part of the dissertation, we apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Most information extraction (IE) systems identify facts that are explicitly stated in text. However, much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. Human readers naturally use common sense knowledge and "read between the lines" to infer such implicit information from the explicitly stated facts. Since IE systems do not have access to common sense knowledge, they cannot perform deeper reasoning to infer implicitly stated facts. Here, we first develop an approach using BLPs to infer implicitly stated facts from natural language text. It involves learning uncertain common sense knowledge in the form of probabilistic first-order rules by mining a large corpus of automatically extracted facts using an existing rule learner. These rules are then used to derive additional facts from extracted information using BLP inference. We then develop an online rule learner that handles the concise, incomplete nature of natural-language text and learns first-order rules from noisy IE extractions. Finally, we develop a novel approach to calculate the weights of the rules using a curated lexical ontology like WordNet.

Both tasks described above involve inference and learning from partially observed or incomplete data. In plan recognition, the underlying cause or the top-level plan that resulted in the observed actions is not known or observed. Further, only a subset of the executed actions can be observed by the plan recognition system resulting in partially observed data. Similarly, in machine reading, since some information is implicitly stated, they are rarely observed in the data. In this dissertation, we demonstrate the efficacy of BLPs for inference and learning from incomplete data. Experimental comparison on various benchmark data sets on both tasks demonstrate the superior performance of BLPs over state-of-the-art methods.

ML ID: 280

- Learning to "Read Between the Lines" using Bayesian Logic Programs

[Details] [PDF] [Slides]

Sindhu Raghavan and Raymond J. Mooney and Hyeonseo Ku

In*Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012)*, 349--358, July 2012.Most information extraction (IE) systems identify facts that are explicitly stated in text. However, in natural language, some facts are implicit, and identifying them requires "reading between the lines". Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts. We propose an approach that uses Bayesian Logic Programs (BLPs), a statistical relational model combining first-order logic and Bayesian networks, to infer additional implicit information from extracted facts. It involves learning uncertain commonsense knowledge (in the form of probabilistic first-order rules) from natural language text by mining a large corpus of automatically extracted facts. These rules are then used to derive additional facts from extracted information using BLP inference. Experimental evaluation on a benchmark data set for machine reading demonstrates the efficacy of our approach.

ML ID: 270

- Constraint Propagation for Efficient Inference in Markov Logic

[Details] [PDF] [Slides]

Tivadar Papai, Parag Singla and Henry Kautz

In*Proceedings of 17th International Conference on Principles and Practice of Constraint Programming (CP 2011)*, Lecture Notes in Computer Science (LNCS), 691-705, September 2011.Many real world problems can be modeled using a combination of hard and soft constraints. Markov Logic is a highly expressive language which represents the underlying constraints by attaching real-valued weights to formulas in first order logic. The weight of a formula represents the strength of the corresponding constraint. Hard constraints are represented as formulas with infinite weight. The theory is compiled into a ground Markov network over which probabilistic inference can be done. For many problems, hard constraints pose a significant challenge to the probabilistic inference engine. However, solving the hard constraints (partially or fully) before hand outside of the probabilistic engine can hugely simplify the ground Markov network and speed probabilistic inference. In this work, we propose a generalized arc consistency algorithm that prunes the domains of predicates by propagating hard constraints. Our algorithm effectively performs unit propagation at a lifted level, avoiding the need to explicitly ground the hard constraints during the pre-processing phase, yielding a potentially exponential savings in space and time. Our approach results in much simplified domains, thereby, making the inference significantly more efficient both in terms of time and memory. Experimental evaluation over one artificial and two real-world datasets show the benefit of our approach.

ML ID: 268

- Online Structure Learning for Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh and Raymond J. Mooney

In*Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011)*, 81-96, September 2011.Most existing learning methods for Markov Logic Networks (MLNs) use batch training, which becomes computationally expensive and eventually infeasible for large datasets with thousands of training examples which may not even all fit in main memory. To address this issue, previous work has used online learning to train MLNs. However, they all assume that the model's structure (set of logical clauses) is given, and only learn the model's parameters. However, the input structure is usually incomplete, so it should also be updated. In this work, we present OSL-the first algorithm that performs both online structure and parameter learning for MLNs. Experimental results on two real- world datasets for natural-language field segmentation show that OSL outperforms systems that cannot revise structure.

ML ID: 267

- Abductive Plan Recognition by Extending Bayesian Logic Programs

[Details] [PDF] [Slides]

Sindhu Raghavan, Raymond J. Mooney

In*Proceedings of the European Conference on Machine Learning/Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011)*, 629-644, September 2011.Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. Most existing approaches to plan recognition use either first-order logic or probabilistic graphical models. While the former can- not handle uncertainty, the latter cannot handle structured representations. In or- der to overcome these limitations, we develop an approach to plan recognition using Bayesian Logic Programs (BLPs), which combine first-order logic and Bayesian networks. Since BLPs employ logical deduction to construct the net- works, they cannot be used effectively for plan recognition. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the result- ing model Bayesian Abductive Logic Programs (BALPs). We learn the parame- ters in BALPs using the Expectation Maximization algorithm adapted for BLPs. Finally, we present an experimental evaluation of BALPs on three benchmark data sets and compare its performance with the state-of-the-art for plan recognition.

ML ID: 266

- Abductive Markov Logic for Plan Recognition

[Details] [PDF] [Slides]

Parag Singla and Raymond J. Mooney

In*Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011)*, 1069-1075, August 2011.Plan recognition is a form of abductive reasoning that involves inferring plans that best explain sets of observed actions. Most existing approaches to plan recognition and other abductive tasks employ either purely logical methods that do not handle uncertainty, or purely probabilistic methods that do not handle structured representations. To overcome these limitations, this paper introduces an approach to abductive reasoning using a first-order probabilistic logic, specifically Markov Logic Networks (MLNs). It introduces several novel techniques for making MLNs efficient and effective for abduction. Experiments on three plan recognition datasets show the benefit of our approach over existing methods.

ML ID: 263

- Extending Bayesian Logic Programs for Plan Recognition and Machine Reading

[Details] [PDF] [Slides]

Sindhu V. Raghavan

Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, May 2011.Statistical relational learning (SRL) is the area of machine learning that integrates both first-order logic and probabilistic graphical models. The advantage of these formalisms is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian networks are a powerful SRL formalism developed in the recent past. In this proposal, we focus on applying BLPs to two real worlds tasks -- plan recognition and machine reading.

Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the proposal, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs). Experimental evaluation on three benchmark data sets demonstrate that BALPs outperform the existing state-of-art methods like Markov Logic Networks (MLNs) for plan recognition.

For future work, we propose to apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Present day information extraction (IE) systems that are trained for machine reading are limited by their ability to extract only factual information that is stated explicitly in the text. We propose to improve the performance of an off-the-shelf IE system by inducing general knowledge rules about the domain using the facts already extracted by the IE system. We then use these rules to infer additional facts using BLPs, thereby improving the recall of the underlying IE system. Here again, the standard inference used in BLPs cannot be used to construct the networks. So, we extend BLPs to perform forward inference on all facts extracted by the IE system and then construct the ground Bayesian networks. We initially use an existing inductive logic programming (ILP) based rule learner to learn the rules. In the longer term, we would like to develop a rule/structure learner that is capable of learning an even better set of first-order rules for BLPs.

ML ID: 258

- Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh

PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2011.

159 pages.Many real-world problems involve data that both have complex structures and uncertainty. Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from these noisy structured/relational data. Markov logic networks (MLNs), sets of weighted first-order logic formulae, are a simple but powerful SRL formalism that generalizes both first-order logic and Markov networks. MLNs have been successfully applied to a variety of real-world problems ranging from extraction knowledge from text to visual event recognition. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that is equally capable of predicting the values of all variables given an arbitrary set of evidence; and they do not scale to problems with thousands of examples. However, many real-world problems in structured/relational data are discriminative---where the variables are divided into two disjoint sets input and output, and the goal is to correctly predict the values of the output variables given evidence data about the input variables. In addition, these problems usually involve data that have thousands of examples. Thus, it is important to develop new discriminative learning methods for MLNs that are more accurate and more scalable, which are the topics addressed in this thesis.

First, we present a new method that discriminatively learns both the structure and parameters for a special class of MLNs where all the clauses are non-recursive ones. Non-recursive clauses arise in many learning problems in Inductive Logic Programming. To further improve the predictive accuracy, we propose a max-margin approach to learning weights for MLNs. Then, to address the issue of scalability, we present CDA, an online max-margin weight learning algorithm for MLNs. Ater that, we present OSL, the first algorithm that performs both online structure learning and parameter learning. Finally, we address an issue arising in applying MLNs to many real-world problems: learning in the presence of many hard constraints. Including hard constraints during training greatly increases the computational complexity of the learning problem. Thus, we propose a simple heuristic for selecting which hard constraints to include during training.

Experimental results on several real-world problems show that the proposed methods are more accurate, more scalable (can handle problems with thousands of examples), or both more accurate and more scalable than existing learning methods for MLNs.

ML ID: 257

- Online Max-Margin Weight Learning for Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh and Raymond J. Mooney

In*Proceedings of the Eleventh SIAM International Conference on Data Mining (SDM11)*, 642--651, Mesa, Arizona, USA, April 2011.Most of the existing weight-learning algorithms for Markov Logic Networks (MLNs) use batch training which becomes computationally expensive and even infeasible for very large datasets since the training examples may not fit in main memory. To overcome this problem, previous work has used online learning algorithms to learn weights for MLNs. However, this prior work has only applied existing online algorithms, and there is no comprehensive study of online weight learning for MLNs. In this paper, we derive a new online algorithm for structured prediction using the primaldual framework, apply it to learn weights for MLNs, and compare against existing online algorithms on three large, real-world datasets. The experimental results show that our new algorithm generally achieves better accuracy than existing methods, especially on noisy datasets.

ML ID: 255

- Implementing Weighted Abduction in Markov Logic

[Details] [PDF]

James Blythe, Jerry R. Hobbs, Pedro Domingos, Rohit J. Kate, Raymond J. Mooney

In*Proceedings of the International Conference on Computational Semantics*, 55--64, Oxford, England, January 2011.Abduction is a method for finding the best explanation for observations. Arguably the most advanced approach to abduction, especially for natural language processing, is weighted abduction, which uses logical formulas with costs to guide inference. But it has no clear probabilistic semantics. In this paper we propose an approach that implements weighted abduction in Markov logic, which uses weighted first-order formulas to represent probabilistic knowledge, pointing toward a sound probabilistic semantics for weighted abduction. Application to a series of challenge problems shows the power and coverage of our approach

ML ID: 254

- Integrating Logical Representations with Probabilistic Information using Markov Logic

[Details] [PDF] [Slides]

Dan Garrette, Katrin Erk, Raymond Mooney

In*Proceedings of the International Conference on Computational Semantics*, 105--114, Oxford, England, January 2011.First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, probabilistic knowledge, for example regarding word meaning. This paper describes the first steps of an approach to recasting first-order semantics into the probabilistic models that are part of Statistical Relational AI. Specifically, we show how Discourse Representation Structures can be combined with distributional models for word meaning inside a Markov Logic Network and used to successfully perform inferences that take advantage of logical concepts such as factivity as well as probabilistic information on word meaning in context.

ML ID: 253

- Online Max-Margin Weight Learning with Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh and Raymond J. Mooney

In*Proceedings of the AAAI-10 Workshop on Statistical Relational AI (Star-AI 10)*, 32--37, Atlanta, GA, July 2010.Most of the existing weight-learning algorithms for Markov Logic Networks (MLNs) use batch training which becomes computationally expensive and even infeasible for very large datasets since the training examples may not fit in main memory. To overcome this problem, previous work has used online learning algorithms to learn weights for MLNs. However, this prior work has only applied existing online algorithms, and there is no comprehensive study of online weight learning for MLNs. In this paper, we derive new online algorithms for structured prediction using the primaldual framework, apply them to learn weights forMLNs, and compare against existing online algorithms on two large, real-world datasets. The experimental results show that the new algorithms achieve better accuracy than existing methods.

ML ID: 245

- Bayesian Abductive Logic Programs

[Details] [PDF] [Slides]

Sindhu Raghavan and Raymond Mooney

In*Proceedings of the AAAI-10 Workshop on Statistical Relational AI (Star-AI 10)*, 82--87, Atlanta, GA, July 2010.In this paper, we introduce Bayesian Abductive Logic Programs (BALPs), a new formalism that integrates Bayesian Logic Programs (BLPs) and Abductive Logic Programming (ALP) for abductive reasoning. Like BLPs, BALPs also combine first-order logic and Bayesian networks. However, unlike BLPs that use logical deduction to construct Bayes nets, BALPs employ logical abduction. As a result, BALPs are more suited for solving problems like plan/activity recognition and diagnosis that require abductive reasoning. First, we present the necessary enhancements to BLPs in order to support logical abduction. Next, we apply BALPs to the task of plan recognition and demonstrate its efficacy on two data sets. We also compare the performance of BALPs with several existing approaches for abduction.

ML ID: 244

- Discriminative Learning with Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh

October 2009. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from noisy structured/relational data. Markov logic networks (MLNs), sets of weighted clauses, are a simple but powerful SRL formalism that combines the expressivity of first-order logic with the flexibility of probabilistic reasoning. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that maximizes the likelihood of the training data. However, most of the learning problems in relational data are discriminative. So to utilize the power of MLNs, we need discriminative learning methods that well match these discriminative tasks.

In this proposal, we present two new discriminative learning algorithms for MLNs. The first one is a discriminative structure and weight learner for MLNs with non-recursive clauses. We use a variant of Aleph, an off-the-shelf Inductive Logic Programming (ILP) system, to learn a large set of Horn clauses from the training data, then we apply an L1-regularization weight learner to select a small set of non-zero weight clauses that maximizes the conditional log-likelihood (CLL) of the training data. The experimental results show that our proposed algorithm outperforms existing learning methods for MLNs and traditional ILP systems in term of predictive accuracy, and its performance is comparable to state-of-the-art results on some ILP benchmarks. The second algorithm we present is a max-margin weight learner for MLNs. Instead of maximizing the CLL of the data like all existing discriminative weight learners for MLNs, the new weight learner tries to maximize the ratio between the probability of the correct label (the observable data) and and the closest incorrect label (among all the wrong labels, this one has the highest probability), which can be formulated as an optimization problem called 1-slack structural SVM. This optimization problem can be solved by an efficient algorithm based on the cutting plane method. However, this cutting plane algorithm requires an efficient inference method as a subroutine. Unfortunately, exact inference in MLNs is intractable. So we develop a new approximation inference method for MLNs based on Linear Programming relaxation. Extensive experiments in two real-world MLN applications demonstrate that the proposed max-margin weight learner generally achieves higher F1 scores than the current best discriminative weight learner for MLNs.

For future work, our short-term goal is to develop a more efficient inference algorithm and test our max-margin weight learner on more complex problems where there are complicated relationships between the input and output variables and among the outputs. In the longer-term, our plan is to develop more efficient learning algorithms through online learning and algorithms that revise both the clauses and their weights to improve predictive performance.

ML ID: 238

- Learning with Markov Logic Networks: Transfer Learning, Structure Learning, and an Application to Web Query Disambiguation

[Details] [PDF]

Lilyana Mihalkova

PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2009. 176 pages.Traditionally, machine learning algorithms assume that training data is provided as a set of independent instances, each of which can be described as a feature vector. In contrast, many domains of interest are inherently multi-relational, consisting of entities connected by a rich set of relations. For example, the participants in a social network are linked by friendships, collaborations, and shared interests. Likewise, the users of a search engine are related by searches for similar items and clicks to shared sites. The ability to model and reason about such relations is essential not only because better predictive accuracy is achieved by exploiting this additional information, but also because frequently the goal is to predict whether a set of entities are related in a particular way. This thesis falls within the area of Statistical Relational Learning (SRL), which combines ideas from two traditions within artificial intelligence, first-order logic and probabilistic graphical models, to address the challenge of learning from multi-relational data. We build on one particular SRL model, Markov logic networks (MLNs), which consist of a set of weighted first-order-logic formulae and provide a principled way of defining a probability distribution over possible worlds. We develop algorithms for learning of MLN structure both from scratch and by transferring a previously learned model, as well as an application of MLNs to the problem of Web query disambiguation. The ideas we present are unified by two main themes: the need to deal with limited training data and the use of bottom-up learning techniques.

Structure learning, the task of automatically acquiring a set of dependencies among the relations in the domain, is a central problem in SRL. We introduce BUSL, an algorithm for learning MLN structure from scratch that proceeds in a more bottom-up fashion, breaking away from the tradition of top-down learning typical in SRL. Our approach first constructs a novel data structure called a Markov network template that is used to restrict the search space for clauses. Our experiments in three relational domains demonstrate that BUSL dramatically reduces the search space for clauses and attains a significantly higher accuracy than a structure learner that follows a top-down approach.

Accurate and efficient structure learning can also be achieved by transferring a model obtained in a source domain related to the current target domain of interest. We view transfer as a revision task and present an algorithm that diagnoses a source MLN to determine which of its parts transfer directly to the target domain and which need to be updated. This analysis focuses the search for revisions on the incorrect portions of the source structure, thus speeding up learning. Transfer learning is particularly important when target-domain data is limited, such as when data on only a few individuals is available from domains with hundreds of entities connected by a variety of relations. We also address this challenging case and develop a general transfer learning approach that makes effective use of such limited target data in several social network domains.

Finally, we develop an application of MLNs to the problem of Web query disambiguation in a more privacy-aware setting where the only information available about a user is that captured in a short search session of 5--6 previous queries on average. This setting contrasts with previous work that typically assumes the availability of long user-specific search histories. To compensate for the scarcity of user-specific information, our approach exploits the relations between users, search terms, and URLs. We demonstrate the effectiveness of our approach in the presence of noise and show that it outperforms several natural baselines on a large data set collected from the MSN search engine.

ML ID: 235

- Max-Margin Weight Learning for Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh and Raymond J. Mooney

In*Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part 1*, 564--579, Bled, Slovenia, September 2009.Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a max-margin framework. This results in a new model, Max-Margin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for loss-augmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 scores than the current best discriminative weight learner for MLNs.

ML ID: 234

- Learning to Disambiguate Search Queries from Short Sessions

[Details] [PDF]

Lilyana Mihalkova and Raymond Mooney

In*Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part 2*, 111--127, Bled, Slovenia, September 2009.Web searches tend to be short and ambiguous. It is therefore not surprising that Web query disambiguation is an actively researched topic. To provide a personalized experience for a user, most existing work relies on search engine log data in which the search activities of that particular user, as well as other users, are recorded over long periods of time. Such approaches may raise privacy concerns and may be difficult to implement for pragmatic reasons. We present an approach to Web query disambiguation that bases its predictions only on a short glimpse of user search activity, captured in a brief session of 4--6 previous searches on average. Our method exploits the relations of the current search session to previous similarly short sessions of other users in order to predict the user's intentions and is based on Markov logic, a statistical relational learning model that has been successfully applied to challenging language problems in the past. We present empirical results that demonstrate the effectiveness of our proposed approach on data collected from a commercial general-purpose search engine.

ML ID: 233

- Max-Margin Weight Learning for Markov Logic Networks

[Details] [PDF]

Tuyen N. Huynh and Raymond J. Mooney

In*Proceedings of the International Workshop on Statistical Relational Learning (SRL-09)*, Leuven, Belgium, July 2009.Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a max-margin framework. This results in a new model, Max-Margin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for loss-augmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 scores than the current best discriminative weight learner for MLNs.

ML ID: 232

- Speeding up Inference In Statistical Relational Learning by Clustering Similar Query Literals

[Details] [PDF]

Lilyana Mihalkova and Matthew Richardson

In*Proceedings of the 19th International Conference on Inductive Logic Programming (ILP-09)*, Leuven, Belgium, July 2009.Markov logic networks (MLNs) have been successfully applied to several challenging problems by taking a programming language approach where a set of formulas is hand-coded and weights are learned from data. Because inference plays an important role in this process, programming with an MLN would be significantly facilitated by speeding up inference. We present a new meta-inference algorithm that exploits the repeated structure frequently present in relational domains to speed up existing inference techniques. Our approach first clusters the query literals and then performs full inference for only one representative from each cluster. The clustering step incurs only a one-time up-front cost when weights are learned over a fixed structure.

ML ID: 231

- Probabilistic Abduction using Markov Logic Networks

[Details] [PDF] [Slides]

Rohit J. Kate and Raymond J. Mooney

In*Proceedings of the IJCAI-09 Workshop on Plan, Activity, and Intent Recognition (PAIR-09)*, Pasadena, CA, July 2009.Abduction is inference to the best explanation of a given set of evidence. It is important for plan or intent recognition systems. Traditional approaches to abductive reasoning have either used first-order logic, which is unable to reason under uncertainty, or Bayesian networks, which can handle uncertainty using probabilities but cannot directly handle an unbounded number of related entities. This paper proposes a new method for probabilistic abductive reasoning that combines the capabilities of first-order logic and graphical models by using Markov logic networks. Experimental results on a plan recognition task demonstrate the effectiveness of this method.

ML ID: 228

- Transfer Learning from Minimal Target Data by Mapping across Relational Domains

[Details] [PDF]

Lilyana Mihalkova and Raymond Mooney

In*Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09)*, 1163--1168, Pasadena, CA, July 2009.A central goal of transfer learning is to enable learning when training data from the domain of interest is limited. Yet, work on transfer across relational domains has so far focused on the case where there is a significant amount of target data. This paper bridges this gap by studying transfer when the amount of target data is minimal and consists of information about just a handful of entities. In the extreme case, only a single entity is known. We present the SR2LR algorithm that finds an effective mapping of predicates from a source model to the target domain in this setting and thus renders pre-existing knowledge useful to the target task. We demonstrate SR2LR's effectiveness in three benchmark relational domains on social interactions and study its behavior as information about an increasing number of entities becomes available.

ML ID: 227

- Search Query Disambiguation from Short Sessions

[Details] [PDF]

Lilyana Mihalkova and Raymond Mooney

In*Beyond Search: Computational Intelligence for the Web Workshop at NIPS*, 2008.Web searches tend to be short and ambiguous. It is therefore not surprising that Web query disambiguation is an actively researched topic. However, most existing work relies on the existence of search engine log data in which each user's search activities are recorded over long periods of time. Such approaches may raise privacy concerns and may be difficult to implement for pragmatic reasons. In this work, we present an approach to Web query disambiguation that bases its predictions only on a short glimpse of user search activity, captured in a brief session of about 5--6 previous searches on average. Our method exploits the relations of the current search session in which the ambiguous query is issued to previous sessions in order to predict the user's intentions and is based on Markov logic. We present empirical results that demonstrate the effectiveness of our proposed approach on data collected form a commercial general-purpose search engine.

ML ID: 225

- Discriminative Structure and Parameter Learning for Markov Logic Networks

[Details] [PDF] [Slides]

Tuyen N. Huynh and Raymond J. Mooney

In*Proceedings of the 25th International Conference on Machine Learning (ICML)*, Helsinki, Finland, July 2008.Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods.

ML ID: 220

- Transfer Learning by Mapping with Minimal Target Data

[Details] [PDF]

Lilyana Mihalkova and Raymond J. Mooney

In*Proceedings of the AAAI-08 Workshop on Transfer Learning For Complex Tasks*, Chicago, IL, July 2008.This paper introduces the single-entity-centered setting for transfer across two relational domains. In this setting, target domain data contains information about only a single entity. We present the SR2LR algorithm that finds an effective mapping of the source model to the target domain in this setting and demonstsrate its effectiveness in three relational domains. Our experiments additionally show that the most accurate model for the source domain is not always the best model to use for transfer.

ML ID: 218

- Improving Learning of Markov Logic Networks using Transfer and Bottom-Up Induction

[Details] [PDF]

Lilyana Mihalkova

Technical Report UT-AI-TR-07-341, Artificial Intelligence Lab, University of Texas at Austin, Austin, TX, May 2007.Statistical relational learning (SRL) algorithms combine ideas from rich knowledge representations, such as first-order logic, with those from probabilistic graphical models, such as Markov networks, to address the problem of learning from multi-relational data. One challenge posed by such data is that individual instances are frequently very large and include complex relationships among the entities. Moreover, because separate instances do not follow the same structure and contain varying numbers of entities, they cannot be effectively represented as a feature-vector. SRL models and algorithms have been successfully applied to a wide variety of domains such as social network analysis, biological data analysis, and planning, among others. Markov logic networks (MLNs) are a recently-developed SRL model that consists of weighted first-order clauses. MLNs can be viewed as templates that define Markov networks when provided with the set of constants present in a domain. MLNs are therefore very powerful because they inherit the expressivity of first-order logic. At the same time, MLNs can flexibly deal with noisy or uncertain data to produce probabilistic predictions for a set of propositions. MLNs have also been shown to subsume several other popular SRL models.

The expressive power of MLNs comes at a cost: structure learning, or learning the first-order clauses of the model, is a very computationally intensive process that needs to sift through a large hypothesis space with many local maxima and plateaus. It is therefore an important research problem to develop learning algorithms that improve the speed and accuracy of this process. The main contribution of this proposal are two algorithms for learning the structure of MLNs that proceed in a more data-driven fashion, in contrast to most existing SRL algorithms. The first algorithm we present, R-TAMAR, improves learning by transferring the structure of an MLN learned in a domain related to the current one. It first diagnoses the transferred structure and then focuses its efforts only on the regions it determines to be incorrect. Our second algorithm, BUSL improves structure learning from scratch by approaching the problem in a more bottom-up fashion and first constructing a variablized Markov network template that significantly constrains the space of viable clause candidates. We demonstrate the effectiveness of our methods in three social domains.

Our proposed future work directions include testing BUSL in additional domains and extending it so that it can be used not only to learn from scratch, but also to revise a provided MLN structure. Our most ambitious long-term goal is to develop a system that transfers knowledge from multiple potential sources. An important prerequisite to such a system is a method for measuring the similarity between domains. We would also like to extend BUSL to learn other SRL models and to handle functions.

ML ID: 217

- Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction

[Details] [PDF]

Razvan Constantin Bunescu

PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 150 pages. Also as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.Information Extraction, the task of locating textual mentions of specific types of entities and their relationships, aims at representing the information contained in text documents in a structured format that is more amenable to applications in data mining, question answering, or the semantic web. The goal of our research is to design information extraction models that obtain improved performance by exploiting types of evidence that have not been explored in previous approaches. Since designing an extraction system through introspection by a domain expert is a laborious and time consuming process, the focus of this thesis will be on methods that automatically induce an extraction model by training on a dataset of manually labeled examples.

Named Entity Recognition is an information extraction task that is concerned with finding textual mentions of entities that belong to a predefined set of categories. We approach this task as a phrase classification problem, in which candidate phrases from the same document are collectively classified. Global correlations between candidate entities are captured in a model built using the expressive framework of Relational Markov Networks. Additionally, we propose a novel tractable approach to phrase classification for named entity recognition based on a special Junction Tree representation.

Classifying entity mentions into a predefined set of categories achieves only a partial disambiguation of the names. This is further refined in the task of Named Entity Disambiguation, where names need to be linked to their actual denotations. In our research, we use Wikipedia as a repository of named entities and propose a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy.

Relation Extraction refers to finding relevant relationships between entities mentioned in text documents. Our approaches to this information extraction task differ in the type and the amount of supervision required. We first propose two relation extraction methods that are trained on documents in which sentences are manually annotated for the required relationships. In the first method, the extraction patterns correspond to sequences of words and word classes anchored at two entity names occurring in the same sentence. These are used as implicit features in a generalized subsequence kernel, with weights computed through training of Support Vector Machines. In the second approach, the implicit extraction features are focused on the shortest path between the two entities in the word-word dependency graph of the sentence. Finally, in a significant departure from previous learning approaches to relation extraction, we propose reducing the amount of required supervision to only a handful of pairs of entities known to exhibit or not exhibit the desired relationship. Each pair is associated with a bag of sentences extracted automatically from a very large corpus. We extend the subsequence kernel to handle this weaker form of supervision, and describe a method for weighting features in order to focus on those correlated with the target relation rather than with the individual entities. The resulting Multiple Instance Learning approach offers a competitive alternative to previous relation extraction methods, at a significantly reduced cost in human supervision.

ML ID: 213

- Mapping and Revising Markov Logic Networks for Transfer Learning

[Details] [PDF]

Lilyana Mihalkova, Tuyen N. Huynh, Raymond J. Mooney

In*Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07)*, 608-614, Vancouver, BC, July 2007.Transfer learning addresses the problem of how to leverage knowledge acquired in a source domain to improve the accuracy and speed of learning in a related target domain. This paper considers transfer learning with Markov logic networks (MLNs), a powerful formalism for learning in relational domains. We present a complete MLN transfer system that first autonomously maps the predicates in the source MLN to the target domain and then revises the mapped structure to further improve its accuracy. Our results in several real-world domains demonstrate that our approach successfully reduces the amount of time and training data needed to learn an accurate model of a target domain over learning from scratch.

ML ID: 203

- Bottom-Up Learning of Markov Logic Network Structure

[Details] [PDF]

Lilyana Mihalkova and Raymond J. Mooney

In*Proceedings of 24th International Conference on Machine Learning (ICML-2007)*, Corvallis, OR, June 2007.Markov logic networks (MLNs) are a statistical relational model that consists of weighted first-order clauses and generalizes first-order logic and Markov networks. The current state-of-the-art algorithm for learning MLN structure follows a top-down paradigm where many potential candidate structures are systematically generated without considering the data and then evaluated using a statistical measure of their fit to the data. Even though this existing algorithm outperforms an impressive array of benchmarks, its greedy search is susceptible to local maxima or plateaus. We present a novel algorithm for learning MLN structure that follows a more bottom-up approach to address this problem. Our algorithm uses a ``propositional'' Markov network learning method to construct ``template'' networks that guide the construction of candidate clauses. Our algorithm significantly improves accuracy and learning time over the existing top-down approach in three real-world domains.

ML ID: 202

- Statistical Relational Learning for Natural Language Information Extraction

[Details] [PDF]

Razvan Bunescu and Raymond J. Mooney

In L. Getoor and B. Taskar, editors,*Introduction to Statistical Relational Learning*, 535-552, Cambridge, MA, 2007. MIT Press.Understanding natural language presents many challenging problems that lend themselves to statistical relational learning (SRL). Historically, both logical and probabilistic methods have found wide application in natural language processing (NLP). NLP inevitably involves reasoning about an arbitrary number of entities (people, places, and things) that have an unbounded set of complex relationships between them. Representing and reasoning about unbounded sets of entities and relations has generally been considered a strength of predicate logic. However, NLP also requires integrating uncertain evidence from a variety of sources in order to resolve numerous syntactic and semantic ambiguities. Effectively integrating multiple sources of uncertain evidence has generally been considered a strength of Bayesian probabilistic methods and graphical models. Consequently, NLP problems are particularly suited for SRL methods that combine the strengths of first-order predicate logic and probabilistic graphical models. In this article, we review our recent work on using Relational Markov Networks (RMNs) for information extraction, the problem of identifying phrases in natural language text that refer to specific types of entities. We use the expressive power of RMNs to represent and reason about several specific relationships between candidate entities and thereby collectively identify the appropriate set of phrases to extract. We present experiments on learning to extract protein names from biomedical text, which demonstrate the advantage of this approach over existing IE methods.

ML ID: 165

- Transfer Learning with Markov Logic Networks

[Details] [PDF]

Lilyana Mihalkova and Raymond Mooney

In*Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning*, Pittsburgh, PA, June 2006.We propose a new algorithm for transfer learning of Markov Logic Network (MLN) structure. An important aspect of our approach is that it first diagnoses the provided source MLN and then focuses on re-learning only the incorrect portions. Experiments in a pair of synthetic domains demonstrate that this strategy significantly decreases the search space and speeds up learning while maintaining a level of accuracy comparable to that of the current best algorithm.

ML ID: 189

- Learning for Collective Information Extraction

[Details] [PDF]

Razvan C. Bunescu

Technical Report TR-05-02, Department of Computer Sciences, University of Texas at Austin, October 2005. Ph.D. proposal.An Information Extraction (IE) system analyses a set of documents with the aim of identifying certain types of entities and relations between them. Most IE systems treat separate potential extractions as independent. However, in many cases, considering influences between different candidate extractions could improve overall accuracy. For example, phrase repetitions inside a document are usually associated with the same entity type, the same being true for acronyms and their corresponding long form. One of our goals in this thesis is to show how these and potentially other types of correlations can be captured by a particular type of undirected probabilistic graphical model. Inference and learning using this graphical model allows for collective information extraction in a way that exploits the mutual influence between possible extractions. Preliminary experiments on learning to extract named entities from biomedical and newspaper text demonstrate the advantages of our approach.

The benefit of doing collective classification comes however at a cost: in the general case, exact inference in the resulting graphical model has an exponential time complexity. The standard solution, which is also the one that we used in our initial work, is to resort to approximate inference. In this proposal we show that by considering only a selected subset of mutual influences between candidate extractions, exact inference can be done in linear time. Consequently, a short term goal is to run comparative experiments that would help us choose between the two approaches: exact inference with a restricted subset of mutual influences or approximate inference with the full set of influences.

The set of issues that we intend to investigate in future work is two fold. One direction refers to applying the already developed framework to other natural language tasks that may benefit from the same types of influences, such as word sense disambiguation and part-of-speech tagging. Another direction concerns the design of a sufficiently general framework that would allow a seamless integration of cues from a variety of knowledge sources. We contemplate using generic sources such as external dictionaries, or web statistics on discriminative textual patterns. We also intend to alleviate the modeling problems due to the intrinsic local nature of entity features by exploiting syntactic information. All these generic features will be input to a feature selection algorithm, so that in the end we obtain a model which is both compact and accurate.ML ID: 155

- Collective Information Extraction with Relational Markov Networks

[Details] [PDF]

Razvan Bunescu and Raymond J. Mooney

In*Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)*, 439-446, Barcelona, Spain, July 2004.Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences

*between*different potential extractions could improve overall accuracy. Statistical methods based on*undirected*graphical models, such as*conditional random fields*(CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks (a generalization of CRFs), which can represent arbitrary dependencies between extractions. This allows for ``collective information extraction'' that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.ML ID: 152

- Relational Markov Networks for Collective Information Extraction

[Details] [PDF]

Razvan Bunescu and Raymond J. Mooney

In*Proceedings of the ICML-04 Workshop on Statistical Relational Learning and its Connections to Other Fields*, Banff, Alberta, July 2004.Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences

*between*different potential extractions could improve overall accuracy. Statistical methods based on*undirected*graphical models, such as*conditional random fields*(CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks, which can represent arbitrary dependencies between extractions. This allows for ``collective information extraction'' that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.ML ID: 145

- A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields

[Details] [PDF]

Mikhail Bilenko and Sugato Basu

In*Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004)*, Banff, Canada, July 2004.Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EM-style clustering algorithm, the E-step of which requires collective assignment of instances to cluster centroids under the constraints. We evaluate three known techniques for such collective assignment: belief propagation, linear programming relaxation, and iterated conditional modes (ICM). The first two methods attempt to globally approximate the optimal assignment, while ICM is a greedy method. Experimental results indicate that global methods outperform the greedy approach when relational supervision is limited, while their benefits diminish as more pairwise constraints are provided.

ML ID: 144

- Automated Construction of Database Interfaces: Integrating Statistical and Relational Learning for Semantic Parsing

[Details] [PDF]

Lappoon R. Tang and Raymond J. Mooney

In*Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora(EMNLP/VLC-2000)*, 133-141, Hong Kong, October 2000.The development of natural language interfaces (NLI's) for databases has been a challenging problem in natural language processing (NLP) since the 1970's. The need for NLI's has become more pronounced due to the widespread access to complex databases now available through the Internet. A challenging problem for empirical NLP is the automated acquisition of NLI's from training examples. We present a method for integrating statistical and relational learning techniques for this task which exploits the strength of both approaches. Experimental results from three different domains suggest that such an approach is more robust than a previous purely logic-based approach.

ML ID: 102