Combining Classifiers for word sense disambiguation

  • Authors:
  • Radu Florian;Silviu Cucerzan;Charles Schafer;David Yarowsky

  • Affiliations:
  • Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@c ...;Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@c ...;Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@c ...;Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@c ...

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classifier combination is an effective and broadly useful method of improving system performance. This article investigates in depth a large number of both well-established and novel classifier combination approaches for the word sense disambiguation task, studied over a diverse classifier pool which includes feature-enhanced Naïve Bayes, Cosine, Decision List, Transformation-based Learning and MMVC classifiers. Each classifier has access to the same rich feature space, comprised of distance weighted bag-of-lemmas, local ngram context and specific syntactic relations, such as Verb-Object and Noun-Modifier. This study examines several key issues in system combination for the word sense disambiguation task, ranging from algorithmic structure to parameter estimation. Experiments using the standard SENSEVAL2 lexical-sample data sets in four languages (English, Spanish, Swedish and Basque) demonstrate that the combination system obtains a significantly lower error rate when compared with other systems participating in the SENSEVAL2 exercise, yielding state-of-the-art performance on these data sets.