Unitary operators on the document space

  • Authors:
  • Eduard Hoenkamp

  • Affiliations:
  • Nijmegen Institute for Cognition and Information, University of Nijmegen, Montessorilaan 3, 6525 HR, Nijmegen, The Netherlands

  • Venue:
  • Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.