A measure theoretic approach to information retrieval

Authors:
Sándor Dominich;Tamás Kiezer
Affiliations:
Department of Computer Science, University of Veszprém, Egyetem u. 10, 8200 Veszprém, Hungary;Department of Computer Science, University of Veszprém, Egyetem u. 10, 8200 Veszprém, Hungary
Venue:
Journal of the American Society for Information Science and Technology
Year:
2007

Citing 16
Cited 0

Another look at automatic text-retrieval systems

Communications of the ACM
Real and complex analysis, 3rd ed.

Real and complex analysis, 3rd ed.
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Fuzzy set theory—and its applications (3rd ed.)

Fuzzy set theory—and its applications (3rd ed.)
Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Understanding search engines: mathematical modeling and text retrieval

Understanding search engines: mathematical modeling and text retrieval
Matrices, Vector Spaces, and Information Retrieval

SIAM Review
A vector space model for automatic indexing

Communications of the ACM
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Text Information Retrieval Systems

Text Information Retrieval Systems
Fuzzy Measure Theory

Fuzzy Measure Theory
Vector space model of information retrieval: a reevaluation

SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
The Geometry of Information Retrieval

The Geometry of Information Retrieval
Discriminative models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The vector space model of information retrieval is one of the classical and widely applied retrieval models. Paradoxically, it has been characterized by a discrepancy between its formal framework and implementable form. The underlying concepts of the vector space model are mathematical terms: linear space, vector, and inner product. However, in the vector space model, the mathematical meaning of these concepts is not preserved. They are used as mere computational constructs or metaphors. Thus, the vector space model actually does not follow formally from the mathematical concepts on which it has been claimed to rest. This problem has been recognized for more than two decades, but no proper solution has emerged so far. The present article proposes a solution to this problem. First, the concept of retrieval is defined based on the mathematical measure theory. Then, retrieval is particularized using fuzzy set theory. As a result, the retrieval function is conceived as the cardinality of the intersection of two fuzzy sets. This view makes it possible to build a connection to linear spaces. It is shown that the classical and the generalized vector space models, as well as the latent semantic indexing model, gain a correct formal background with which they are consistent. At the same time it becomes clear that the inner product is not a necessary ingredient of the vector space model, and hence of Information Retrieval (IR). The Principle of Object Invariance is introduced to handle this situation. Moreover, this view makes it possible to consistently formulate new retrieval methods: in linear space with general basis, entropy-based, and probability-based. It is also shown that Information Retrieval may be viewed as integral calculus, and thus it gains a very compact and elegant mathematical way of writing. Also, Information Retrieval may thus be conceived as an application of mathematical measure theory. © 2007 Wiley Periodicals, Inc.