The Boeing-Princeton-ISI (BPI) Textual Entailment Test Suite
The original 65 texts T are drawn from a mixture of newspaper articles, the AQUAINT KB-Eval data set, PASCAL RTE, and by hand. The H sentences were written by hand, authoring what seemed like common sense inferences that followed from the text, without (as much as possible) regard for what might or might be feasible to answer computationally. In particular, the examples include inferences requiring world knowledge, not just syntactic manipulation. Many are very challenging. The original dataset had about 300 positive entailments in; these were then culled back to those which seemed feasible in the medium term, resulting in 125 positive entailments. 125 negative entailments were then added by hand, resulting in a final corpus size of 250 pairs.
We also performed an analysis of what kinds of knowledge were required for the 125 positive entailments, resulting in 15 somewhat loose categories of knowledge. A list of these, plus pointers to the original example pairs, is given below also.