An Algorithm that Learns What‘s in a Name

  • Authors:
  • Daniel M. Bikel;Richard Schwartz;Ralph M. Weischedel

  • Affiliations:
  • BBN Systems and Technologies, 70 Fawcett Street, Cambridge MA 02138. dbikel@seas.upenn.edu;BBN Systems and Technologies, 70 Fawcett Street, Cambridge MA 02138. schwartz@bbn.com;BBN Systems and Technologies, 70 Fawcett Street, Cambridge MA 02138. weisched@bbn.com

  • Venue:
  • Machine Learning - Special issue on natural language learning
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present IdentiFinderTM, ahidden Markov model that learns to recognize and classify names,dates, times, and numerical quantities. We have evaluated themodel in English (based on data from the Sixth and SeventhMessage Understanding Conferences [MUC-6, MUC-7] and broadcastnews) and in Spanish (based on data distributed through theFirst Multilingual Entity Task [MET-1]), and on speech input(based on broadcast news). We report results here on standardmaterials only to quantify performance on data available to thecommunity, namely, MUC-6 and MET-1. Results have beenconsistently better than reported by any other learningalgorithm. IdentiFinder‘s performance is competitive withapproaches based on handcrafted rules on mixed case text andsuperior on text where case information is not available. Wealso present a controlled experiment showing the effect oftraining set size on performance, demonstrating that as littleas 100,000 words of training data is adequate to get performancearound 90% on newswire. Although we present our understandingof why this algorithm performs so well on this class ofproblems, we believe that significant improvement in performancemay still be possible.