A new method for information retrieval, based on the theory of relative concentration

  • Authors:
  • L. Egghe

  • Affiliations:
  • LUC, Universitaire Campus, B-3610 Diepenbeek, Belgium and UIA, Universiteitsplein 1, B-2610 Wilrijk, Belgium

  • Venue:
  • SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a new method for information retrieval of documents that are represented by a vector. The novelty of the algorithm lies in the fact that no (generalized) p-norms are used as a matching function between the query and the document (as is done e.g. by Salton and others) but a function that measures the relative dispersion of the terms between a document and a query. This function originates from an earlier paper of the author where a good measure of relative concentration was introduced, used in informetrics to measure the degree of specialization of a journal w.r.t. the entire subject.This new information retrieval algorithm is shown to have many desirable properties (in the sense of the new Cater-Kraft wish list) including those of the original cosine-matching function of Salton. In addition the property of the cosine-matching function that, if one only uses weights 0 to 1, one is reduced to Boolean IR, is refined in the sense that one takes into consideration the broadness or specialization of a document and a query. Our new matching function satisfies these additional properties.