Improving accuracy in word class tagging through the combination of machine learning systems

Authors:
Hans van Halteren;Walter Daelemans;Jakub Zavrel
Affiliations:
University of Nijmegen;University of Antwerp;University of Antwerp
Venue:
Computational Linguistics
Year:
2001

Citing 31
Cited 46

Grammatical category disambiguation by statistical optimization

Computational Linguistics
Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Stacked regressions

Machine Learning
Bagging predictors

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Error reduction through learning multiple descriptions

Machine Learning
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Statistical Language Learning

Statistical Language Learning
Guest Editors‘ Introduction

Machine Learning
Stacking Bagged and Dagged Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximum Entropy Modeling with Clausal Constraints

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
New models for improving supertag disambiguation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Combining unsupervised lexical knowledge methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Towards a single proposal in spelling correction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Handling sparse data by successive abstraction

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A default first order family weight determination procedure for WPDV models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Chunking with WPDV models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Stacked generalization: when does it work?

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research

A machine learning approach to modeling scope preferences

Computational Linguistics
Performance analysis of pattern classifier combination by plurality voting

Pattern Recognition Letters
Impact of imperfect OCR on part-of-speech tagging

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Memory-based shallow parsing

The Journal of Machine Learning Research
Improving part-of-speech tagging using lexicalized HMMs

Natural Language Engineering
Using name-internal and contextual features to classify biological terms

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Detecting errors in corpora using support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Revision learning and its application to part-of-speech tagging

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A default first order family weight determination procedure for WPDV models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Chunking with WPDV models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Modeling consensus: classifier combination for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Blueprint for a high performance NLP infrastructure

SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
HowtogetaChineseName(Entity): segmentation and combination issues

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Author verification by linguistic profiling: An exploration of the parameter space

ACM Transactions on Speech and Language Processing (TSLP)
Combining Information Extraction Systems Using Voting and Stacked Generalization

The Journal of Machine Learning Research
Linguistic profiling for author recognition and verification

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Beyond N in N-gram tagging

ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Detecting errors in discontinuous structural annotation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Ensemble methods for unsupervised WSD

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Natural language tagging with genetic algorithms

Information Processing Letters
Efficient text chunking using linear kernel with masked method

Knowledge-Based Systems
Combining data-driven systems for improving Named Entity Recognition

Data & Knowledge Engineering
Negation recognition in medical narrative reports

Information Retrieval
Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Recursive data mining for role identification

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Correcting a PoS-tagged corpus using three complementary methods

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Semi-supervised training for the averaged perceptron POS tagger

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Using semantic and syntactic graphs for call classification

FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Classifier combination techniques applied to coreference resolution

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Learning sentence-internal temporal relations

Journal of Artificial Intelligence Research
Improving parsing accuracy by combining diverse dependency parsers

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Evolutionary computing as a tool for grammar development

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Analysis, design and implementation of a multiagent system, to extract defining contexts based on a linguistic corpus in the neurological disease domain

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Conversion of Japanese passive/causative sentences into active sentences using machine learning

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Recursive data mining for role identification in electronic communications

International Journal of Hybrid Intelligent Systems
Towards robust multi-tool tagging. An OWL/DL-based approach

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Application of stacked methods to part-of-speech tagging of polish

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Classifying and filtering blind feedback terms to improve information retrieval effectiveness

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Combining multiple statistical classifiers to improve the accuracy of task classification

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Applying stacking and corpus transformation to a chunking task

EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Inductive improvement of part-of-speech tagging and its effect on a terminology of molecular biology

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Data-Driven part-of-speech tagging of kiswahili

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Combining polish morphosyntactic taggers

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Dealing with orthographic variation in a tagger-lemmatizer for fourteenth century Dutch charters

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine how differences in language models, learned by different data-driven systems performing the same NLP task, can be exploited to yield a higher accuracy than the best individual system. We do this by means of experiments involving the task of morphosyntactic word class tagging, on the basis of three different tagged corpora. Four well-known tagger generators (hidden Markov model, memory-based, transformation rules, and maximum entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second-stage classifiers. All combination taggers outperform their best component. The reduction in error rate varies with the material in question, but can be as high as 24.3% with the LOB corpus.