Selecting the most highly correlated pairs within a large vocabulary

  • Authors:
  • Kyoji Umemura

  • Affiliations:
  • Toyoahshi University of Technology

  • Venue:
  • SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Occurence patterns of words in documents can be expressed as binary vectors. When two vectors are similar, the two words corresponding to the vectors may have some implicit relationship with each other. We call these two words a correlated pair. This report describes a method for obtaining the most highly correlated pairs of a given size. In practice, the method requires O(N x log(N)) computation time, and O(N) memory space, where N is the number of documents or records. Since this does not depend on the size of the vocabulary under analysis, it is possible to compute correlations between all the words in a corpus.