Learning to Parse Natural Language with Maximum Entropy Models ADWAIT RATNAPARKHI adwait@unagi.cis.upenn.edu Dept. of Computer and Information Science University of Pennsylvania 200 South 33rd Street Philadelphia, PA 19104-6389 Now at IBM TJ Watson Research Center PO Box 218 Yorktown Heights, NY 10598 aratnapa@us.ibm.com This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the state-of-the-art. Keywords: maximum entropy models, statistical parsing, corpus-based parsing