A general matrix framework for modelling Information Retrieval

  • Authors:
  • Thomas Rölleke;Theodora Tsikrika;Gabriella Kazai

  • Affiliations:
  • Department of Computer Science, Queen Mary University of London, London E1 4NS, UK;Department of Computer Science, Queen Mary University of London, London E1 4NS, UK;Department of Computer Science, Queen Mary University of London, London E1 4NS, UK

  • Venue:
  • Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a well-defined general matrix framework for modelling Information Retrieval (IR). In this framework, collections, documents and queries correspond to matrix spaces. Retrieval aspects, such as content, structure and semantics, are expressed by matrices defined in these spaces and by matrix operations applied on them. The dualities of these spaces are identified through the application of frequency-based operations on the proposed matrices and through the investigation of the meaning of their eigenvectors. This allows term weighting concepts used for content-based retrieval, such as term frequency and inverse document frequency, to translate directly to concepts for structure-based retrieval. In addition, concepts such as pagerank, authorities and hubs, determined by exploiting the structural relationships between linked documents, can be defined with respect to the semantic relationships between terms. Moreover, this mathematical framework can be used to express classical and alternative evaluation measures, involving, for instance, the structure of documents, and to further explain and relate IR models and theory. The high level of reusability and abstraction of the framework leads to a logical layer for IR that makes system design and construction significantly more efficient, and thus, better and increasingly personalised systems can be built at lower costs.