Categorizing unknown words: using decision trees to identify names and misspellings

  • Authors:
  • Janine Toole

  • Affiliations:
  • Simon Fraser University, Burnaby, BC, Canada

  • Venue:
  • ANLC '00 Proceedings of the sixth conference on Applied natural language processing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a system for categorizing unknown words. The system is based on a multicomponent architecture where each component is responsible for identifying one class of unknown words. The focus of this paper is the components that identify names and spelling errors. Each component uses a decision tree architecture to combine multiple types of evidence about the unknown word. The system is evaluated using data from live closed captions - a genre replete with a wide variety of unknown words.