Named entity recognition using a character-based probabilistic approach

Authors:
Casey Whitelaw;Jon Patrick
Affiliations:
University of Sydney;University of Sydney
Venue:
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Year:
2003

Citing 6
Cited 4

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Named entity recognition: a maximum entropy approach using global information

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Teaching a weaker classifier: named entity recognition on upper case text

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
SLINERC: the Sydney Language-Independent Named Entity Recogniser and Classifier

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Meta-learning orthographic and contextual models for language independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
CRF-based active learning for Chinese named entity recognition

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Simultaneous character-cluster-based word segmentation and named entity recognition in Thai language

KICSS'10 Proceedings of the 5th international conference on Knowledge, information, and creativity support systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps words with high accuracy. We report f-values of 86.65 and 79.78 for English, and 50.62 and 54.43 for the German datasets.