Scaling context space

Authors:
James R. Curran;Marc Moens
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 15
Cited 17

A cluster-based approach to thesaurus construction

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Class-based n-gram models of natural language

Computational Linguistics
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic Detection of Thesaurus relations for Information Retrieval Applications

Foundations of Computer Science: Potential - Theory - Cognition, to Wilfried Brauer on the occasion of his sixtieth birthday
Partial parsing via finite-state cascades

Natural Language Engineering
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Class-based probability estimation using a semantic hierarchy

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Robust, applied morphological generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Improvements in automatic thesaurus extraction

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9

Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Improvements in automatic thesaurus extraction

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Ensemble methods for automatic thesaurus extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A very very large corpus doesn't always yield reliable estimates

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Blueprint for a high performance NLP infrastructure

SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Supersense tagging of unknown nouns using semantic similarity

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Towards terascale knowledge acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
MetaCoDe: A Lightweight UMLS Mapping Tool

AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
Random indexing using statistical weight functions

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Approximate searching for distributional similarity

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Data selection in semi-supervised learning for name tagging

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Lexical acquisition for clinical text mining using distributional similarity

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
SemEval-2012 task 4: evaluating Chinese word similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Automatic thesaurus construction for cross generation corpus

Journal on Computing and Cultural Heritage (JOCCH)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context is used in many NLP systems as an indicator of a term's syntactic and semantic function. The accuracy of the system is dependent on the quality and quantity of contextual information available to describe each term. However, the quantity variable is no longer fixed by limited corpus resources. Given fixed training time and computational resources, it makes sense for systems to invest time in extracting high quality contextual information from a fixed corpus. However, with an effectively limitless quantity of text available, extraction rate and representation size need to be considered. We use thesaurus extraction with a range of context extracting tools to demonstrate the interaction between context quantity, time and size on a corpus of 300 million words.