A new method for information retrieval, based on the theory of relative concentration

Authors:
L. Egghe
Affiliations:
LUC, Universitaire Campus, B-3610 Diepenbeek, Belgium and UIA, Universiteitsplein 1, B-2610 Wilrijk, Belgium
Venue:
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1989

Citing 4
Cited 1

A generalization and clarification of the Waller-Kraft wish list

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Extended boolean retrieval: a heuristic approach?

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Extended Boolean information retrieval

Communications of the ACM
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.

Construction of concentration measures for General Lorenz curves using Riemann-Stieltjes integrals

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new method for information retrieval of documents that are represented by a vector. The novelty of the algorithm lies in the fact that no (generalized) p-norms are used as a matching function between the query and the document (as is done e.g. by Salton and others) but a function that measures the relative dispersion of the terms between a document and a query. This function originates from an earlier paper of the author where a good measure of relative concentration was introduced, used in informetrics to measure the degree of specialization of a journal w.r.t. the entire subject.This new information retrieval algorithm is shown to have many desirable properties (in the sense of the new Cater-Kraft wish list) including those of the original cosine-matching function of Salton. In addition the property of the cosine-matching function that, if one only uses weights 0 to 1, one is reduced to Boolean IR, is refined in the sense that one takes into consideration the broadness or specialization of a document and a query. Our new matching function satisfies these additional properties.