Lexical analysis and stoplists
Information retrieval
Information retrieval
Information retrieval
Use of syntactic context to produce term association lists for text retrieval
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Human memory models and term association
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English
Communications of the ACM
Clumping properties of content-bearing words
Journal of the American Society for Information Science
Semantic Road Maps for Literature Searchers
Journal of the ACM (JACM)
Discovering term occurence structure in text
Journal of the American Society for Information Science and Technology
Ranked retrieval with semantic networks and vector spaces
Journal of the American Society for Information Science and Technology
An evaluation of term dependence models in information retrieval
SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Factor matrix text filtering and clustering: Research Articles
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
Automated information retrieval relies heavily on statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated on a large text corpus built from an on-line encyclopedia.