Improving data driven wordclass tagging by system combination

Authors:
Hans van Halteren;Jakub Zavrel;Walter Daelemans
Affiliations:
University of Nijmegen, Nijmegen, The Netherlands;Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Year:
1998

Citing 6
Cited 51

Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Error reduction through learning multiple descriptions

Machine Learning
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing

A Machine Learning Approach to POS Tagging

Machine Learning
Categorizing Unknown Words: A Decision Tree-Based Misspelling Identifier

AI '99 Proceedings of the 12th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Tiered Tagging and Combined Language Models Classifiers

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Application of Different Learning Methods to Hungarian Part-of-Speech Tagging

ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Impact of imperfect OCR on part-of-speech tagging

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
Retrieving NASA problem reports: a case study in natural language information retrieval

Data & Knowledge Engineering - NLDB2002
Combining Classifiers for word sense disambiguation

Natural Language Engineering
Categorizing unknown words: using decision trees to identify names and misspellings

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Noun phrase recognition by system combination

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
New models for improving supertag disambiguation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Aspects of pattern-matching in Data-Oriented Parsing

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Committee-based decision making in probabilistic partial parsing

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Improving part-of-speech tagging using lexicalized HMMs

Natural Language Engineering
Tagging and chunking with bigrams

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Applying system combination to base noun phrase identification

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A flexible distributed architecture for NLP system development and use

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A second-order Hidden Markov Model for part-of-speech tagging

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Annotating topological fields and chunks: and revising POS tags at the same time

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Mapping lexical entries in a verbs database to WordNet senses

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A SNoW based supertagger with application to NP chunking

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Using existing systems to supplement small amounts of annotated grammatical relations training data

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Segmenting documents by stylistic character

Natural Language Engineering
The role of algorithm bias vs information source in learning algorithms for Morphosyntactic Disambiguation

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Text chunking by system combination

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Learning computational grammars

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Unsupervised Italian word sense disambiguation using WordNets and unlabeled corpora

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Modeling consensus: classifier combination for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Ensemble methods for automatic thesaurus extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Combining outputs of multiple Japanese named entity chunkers by stacking

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Learning with multiple stacking for named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Chinese word segmentation as LMR tagging

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

SAICSIT '05 Proceedings of the 2005 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
An empirical study of Chinese chunking

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Minority vote: at-least-N voting improves recall for extracting relations

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Syntactic parser combination for improved dependency analysis

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Improving parsing accuracy by combining diverse dependency parsers

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Name matching between Chinese and Roman scripts: machine complements human

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Performance analysis of a part of speech tagging task

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Towards robust multi-tool tagging. An OWL/DL-based approach

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Machine transliteration survey

ACM Computing Surveys (CSUR)
Improving hierarchical document signature performance by classifier combination

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
A comparative study of classifier combination methods applied to NLP tasks

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Automatically inducing a part-of-speech tagger by projecting from multiple source languages across aligned corpora

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
The Johns Hopkins SENSEVAL2 system descriptions

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
A comparative study of classifier combination applied to NLP tasks

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generator (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best indvidual tagger.