Memory-based named entity recognition using unannotated data

Authors:
Fien De Meulder;Walter Daelemans
Affiliations:
University of Antwerp;University of Antwerp
Venue:
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Year:
2003

Citing 1
Cited 5

Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Mining knowledge from text using information extraction

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity normalization in user generated content

Proceedings of the second workshop on Analytics for noisy unstructured text data
Empirical study on the performance stability of named entity recognition model across domains

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We used the memory-based learner Timbl (Daelemans et al., 2002) to find names in English and German newspaper text. A first system used only the training data, and a number of gazetteers. The results show that gazetteers are not beneficial in the English case, while they are for the German data. Type-token generalization was applied, but also reduced performance. The second system used gazetteers derived from the unannotated corpus, as well as the ratio of capitalized versus uncapitalized use of each word. These strategies gave an increase in performance.