Selecting the most highly correlated pairs within a large vocabulary

Authors:
Kyoji Umemura
Affiliations:
Toyoahshi University of Technology
Venue:
SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Year:
2002

Citing 4
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
MARSYAS: a framework for audio analysis

Organised Sound

Quantified Score

Hi-index	0.00

Visualization

Abstract

Occurence patterns of words in documents can be expressed as binary vectors. When two vectors are similar, the two words corresponding to the vectors may have some implicit relationship with each other. We call these two words a correlated pair. This report describes a method for obtaining the most highly correlated pairs of a given size. In practice, the method requires O(N x log(N)) computation time, and O(N) memory space, where N is the number of documents or records. Since this does not depend on the size of the vocabulary under analysis, it is possible to compute correlations between all the words in a corpus.