Learning to Predict Readability using Diverse Linguistic Features (2010)
Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos and Chris Welty
In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability
In 23rd International Conference on Computational Linguistics (COLING 2010) 2010.

Slides (PPT)
Rohit Kate Postdoctoral Alumni katerj [at] uwm edu
Raymond J. Mooney Faculty mooney [at] cs utexas edu