Unitary operators on the document space

Authors:
Eduard Hoenkamp
Affiliations:
Nijmegen Institute for Cognition and Information, University of Nijmegen, Montessorilaan 3, 6525 HR, Nijmegen, The Netherlands
Venue:
Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Year:
2003

Citing 11
Cited 10

Ten lectures on wavelets

Ten lectures on wavelets
On the early history of the singular value decomposition

SIAM Review
Using linear algebra for intelligent information retrieval

SIAM Review
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Essential wavelets for statistical applications and data analysis

Essential wavelets for statistical applications and data analysis
Supporting content retrieval from WWW via “basic level categories” (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The feature quantity: an information theoretic perspective of Tfidf-like measures

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Load Adaptive Algorithms and Implementations for the 2D Discrete Wavelet Transform on Fine-Grain Multithreaded Architectures

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.
Unitary equivalence: a new twist on signal processing

IEEE Transactions on Signal Processing

The document as an ergodic markov chain

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating semantic indexing techniques through cross-language fingerprinting

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Live visual relevance feedback for query formulation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Design Of The Narrator System: Processing, Storing And Retrieving Medical Narrative Data

Journal of Integrated Design & Process Science - Applications of formal methods
TDM modeling and evaluation of different domain transforms for LSI

Neurocomputing
Improving text classification by a sense spectrum approach to term expansion

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
A Kernel-based feature weighting for text classification

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Understanding latent semantic indexing: A topological structure analysis using Q-analysis

Journal of the American Society for Information Science and Technology
Trading spaces: on the lore and limitations of latent semantic analysis

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
A fingerprinting technique for evaluating semantics based indexing

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.