Named entity recognition using a character-based probabilistic approach

  • Authors:
  • Casey Whitelaw;Jon Patrick

  • Affiliations:
  • University of Sydney;University of Sydney

  • Venue:
  • CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps words with high accuracy. We report f-values of 86.65 and 79.78 for English, and 50.62 and 54.43 for the German datasets.