An Algorithm that Learns What's in a Name Daniel Bikel, Richard Schwartz, and Ralph M. Weischedel BBN Systems & Technologies 70 Fawcett Street Cambridge MA 02138 email: weisched@bbn.com Abstract In this paper, we present a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. We have evaluated the software IdentiFinder=81 in English (based on MUC-6, MUC-7, and broadcast news data) and in Spanish (based on data distributed through MET-1), and on speech input (based on broadcast news). We report results here on standard materials only to quantify performance on data available to the community, namely, MUC-6 and MET-1. Results have been consistently better than reported by any other learning algorithm. IdentiFinder's performance is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available. We also present a controlled experiment showing the effect of training set size on performance, demonstrating that as little as 100,000 words of training data is adequate to get performance around 90% on newswire. Though we present our understanding of why this algorithm performs so well on this class of problems, we believe that significant improvement in performance may still be possible.