Ensemble methods for automatic thesaurus extraction

Authors:
James R. Curran
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom
Venue:
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Year:
2002

Citing 16
Cited 16

Class-based n-gram models of natural language

Computational Linguistics
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic Detection of Thesaurus relations for Information Retrieval Applications

Foundations of Computer Science: Potential - Theory - Cognition, to Wilfried Brauer on the occasion of his sixtieth birthday
Noun phrase recognition by system combination

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Partial parsing via finite-state cascades

Natural Language Engineering
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Scaling context space

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Class-based probability estimation using a semantic hierarchy

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Robust, applied morphological generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Improvements in automatic thesaurus extraction

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9

Using appraisal groups for sentiment analysis

Proceedings of the 14th ACM international conference on Information and knowledge management
Optimizing synonym extraction using monolingual and bilingual resources

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
A survey on sentiment detection of reviews

Expert Systems with Applications: An International Journal
Predicting strong associations on the basis of corpus data

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Graph-based word clustering using a web search engine

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Robust estimation of Google counts for social network extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Data selection in semi-supervised learning for name tagging

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Combining syntactic co-occurrences and nearest neighbours in distributional methods to remedy data sparseness

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Thesaurus extension using web search engines

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Ontology-based information content computation

Knowledge-Based Systems
Ontology-based semantic similarity: A new feature-based approach

Expert Systems with Applications: An International Journal
Ensemble-based semantic lexicon induction for semantic tagging

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A study of hybrid similarity measures for semantic relation extraction

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ensemble methods are state of the art for many NLP tasks. Recent work by Banko and Brill (2001) suggests that this would not necessarily be true if very large training corpora were available. However, their results are limited by the simplicity of their evaluation task and individual classifiers.Our work explores ensemble efficacy for the more complex task of automatic thesaurus extraction on up to 300 million words. We examine our conflicting results in terms of the constraints on, and complexity of, different contextual representations, which contribute to the sparseness-and noise-induced bias behaviour of NLP systems on very large corpora.