The Boeing-Princeton-ISI (BPI) Textual Entailment Test Suite

About the BPI Suite

This textual entailment test suite was developed jointly by Boeing, Princeton, and ISI under the AQUAINT program, specifically to look at entailment problems requiring world knowledge. It contains 125 positive and 125 negative (no entailment) pairs. Compared with the PASCAL RTE data sets, the BPI suite is syntactically simpler but semantically challenging, with the intension of focusing more on the knowledge rather than just linguistic requirements.

The original 65 texts T are drawn from a mixture of newspaper articles, the AQUAINT KB-Eval data set, PASCAL RTE, and by hand. The H sentences were written by hand, authoring what seemed like common sense inferences that followed from the text, without (as much as possible) regard for what might or might be feasible to answer computationally. In particular, the examples include inferences requiring world knowledge, not just syntactic manipulation. Many are very challenging. The original dataset had about 300 positive entailments in; these were then culled back to those which seemed feasible in the medium term, resulting in 125 positive entailments. 125 negative entailments were then added by hand, resulting in a final corpus size of 250 pairs.

We also performed an analysis of what kinds of knowledge were required for the 125 positive entailments, resulting in 15 somewhat loose categories of knowledge. A list of these, plus pointers to the original example pairs, is given below also.


This test suite may be downloaded and used without restriction, though we would appreciate an acknowledgement if you publish results using it, and we would also be interested to hear what kind of performance you get!
Peter Clark (, Christiane Fellbaum (, Jerry Hobbs (