A generalization and clarification of the Waller-Kraft wish list
Information Processing and Management: an International Journal - Modeling data, information and knowledge
Extended boolean retrieval: a heuristic approach?
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Extended Boolean information retrieval
Communications of the ACM
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
Construction of concentration measures for General Lorenz curves using Riemann-Stieltjes integrals
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.00 |
This paper introduces a new method for information retrieval of documents that are represented by a vector. The novelty of the algorithm lies in the fact that no (generalized) p-norms are used as a matching function between the query and the document (as is done e.g. by Salton and others) but a function that measures the relative dispersion of the terms between a document and a query. This function originates from an earlier paper of the author where a good measure of relative concentration was introduced, used in informetrics to measure the degree of specialization of a journal w.r.t. the entire subject.This new information retrieval algorithm is shown to have many desirable properties (in the sense of the new Cater-Kraft wish list) including those of the original cosine-matching function of Salton. In addition the property of the cosine-matching function that, if one only uses weights 0 to 1, one is reduced to Boolean IR, is refined in the sense that one takes into consideration the broadness or specialization of a document and a query. Our new matching function satisfies these additional properties.