Abstract: For this year's RTE challenge we have continued to pursue a (somewhat) "logical" approach to recognizing entailment, in which our system, called BLUE (Boeing Language Understanding Engine) first creates a logic-based representation of a text T and then performs simple inference (using WordNet and the DIRT inference rule database) to try and infer a hypothesis H. The overall system can be viewed as comprising of three main elements: parsing, WordNet, and DIRT, built on top of a simple baseline of bag-of-words comparison. Ablation studies suggest that WordNet substantially improves the accuracy scores, while, somewhat suprisingly, parsing and DIRT only marginally improve the accuracy scores. We illustrate and discuss these results. Overall, BLUE's reasoning is sometimes insightful but sometimes nonsensical, the primary challenges being noise in the knowledge sources, lack of world knowledge, and the difficulty of accurate syntactic and semantic analysis. Despite these challenges, we argue that forming semantic representations is a necessary first step towards the larger goal of machine reading, and worthy of further exploration. Our best scores were 61.5% (2 way), 54.7% (3 way), and F=0.29 (Search Pilot).
Poster (PDF): http://www.cs.utexas.edu/users/pclark/papers/rte5-poster.pdf
Slide Presentation (PowerPoint): http://www.cs.utexas.edu/users/pclark/papers/rte5.ppt