Ranking algorithms for named-entity extraction: boosting and the voted perceptron

  • Authors:
  • Michael Collins

  • Affiliations:
  • AT&T Labs-Research, New Jersey

  • Venue:
  • ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger, the application being the recovery of named-entity boundaries in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.