Combining data-driven systems for improving Named Entity Recognition

Authors:
Z. Kozareva;O. Ferrández;A. Montoyo;R. Muñoz;A. Suárez;J. Gómez
Affiliations:
Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 26
Cited 10

A maximum entropy approach to natural language processing

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A maximum entropy-based word sense disambiguation system

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Adaptive language modeling using the maximum entropy principle

HLT '93 Proceedings of the workshop on Human Language Technology
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Language independent NER using a unified model of internal and contextual evidence

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Named entity recognition as a house of cards: classifier stacking

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A stacked, voted, stacked model for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Low-cost Named Entity Classification for Catalan: exploiting multilingual resources and unlabeled data

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Syntax-based semi-supervised named entity tagging

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
GeoCLEF: the CLEF 2005 cross-language geographic information retrieval track overview

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Self-training and co-training applied to spanish named entity recognition

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence

Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization

Information Sciences: an International Journal
Improving Question Answering Tasks by Textual Entailment Recognition

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
A language independent approach for name categorization and discrimination

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Studying the influence of semantic constraints in AVE

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Maximum entropy named entity recognition for Czech language

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
A history-based matching approach to identification of framework evolution

Proceedings of the 34th International Conference on Software Engineering
A comparative study of classifier combination applied to NLP tasks

Information Fusion
Aggregating semantic annotators

Proceedings of the VLDB Endowment
Identifying the Truth: Aggregation of Named Entity Extraction Results

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Event identification in web social media through named entity recognition and topic modeling

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the identification of proper names (Named Entities) in natural language text, but also their classification into a set of predefined categories, such as names of persons, organizations (companies, government organizations, committees, etc.), locations (cities, countries, rivers, etc.) and miscellaneous (movie titles, sport events, etc.). Throughout the paper, we examine the differences between language models learned by different data-driven classifiers confronted with the same NLP task, as well as ways to exploit these differences to yield a higher accuracy than the best individual classifier. Three machine learning classifiers (Hidden Markov Model, Maximum Entropy and Memory Based Learning) are trained on the same corpus in order to resolve the NE task. After comparison, their output is combined using voting strategies. A comprehensive study and experimental work on the evaluation of our system, as well as a comparison with other systems has been carried out within the framework of two specialized scientific competitions for NER, CoNLL-2002 and HAREM-2005. Finally, this paper describes the integration of our NER system in different NLP applications, in concrete Geographic Information Retrieval and Conceptual Modelling.