Ensemble methods for automatic thesaurus extraction

  • Authors:
  • James R. Curran

  • Affiliations:
  • University of Edinburgh, Edinburgh, United Kingdom

  • Venue:
  • EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble methods are state of the art for many NLP tasks. Recent work by Banko and Brill (2001) suggests that this would not necessarily be true if very large training corpora were available. However, their results are limited by the simplicity of their evaluation task and individual classifiers.Our work explores ensemble efficacy for the more complex task of automatic thesaurus extraction on up to 300 million words. We examine our conflicting results in terms of the constraints on, and complexity of, different contextual representations, which contribute to the sparseness-and noise-induced bias behaviour of NLP systems on very large corpora.