Performance of self-taught documents: exploiting co-relevance structure in a document collection

  • Authors:
  • Abraham Bookstein

  • Affiliations:
  • Graduate Library School, University of Chicago

  • Venue:
  • Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1986

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we study the behavior of an information retrieval system in which index terms are assigned at random to both documents and requests. The random indexing is then modified by means of a feedback mechanism derived from a normal probability model and applied to both the request and document representations. Of interest is the convergence properties of the representation vectors. After few feedback iterations, it is found that well defined clusters form that accurately represent the corelevance structure among the documents—in effect the feedback mechanism has permitted the documents to index themselves. This approach offers an interesting way to extend the dimensionality of the indexing vocabulary. Both this application and a theoretical analysis of the impact of extending the indexing vocabulary are discussed.