Composite document extended retrieval: an overview

  • Authors:
  • Edward A. Fox

  • Affiliations:
  • Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA

  • Venue:
  • SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1985

Quantified Score

Hi-index 0.00

Visualization

Abstract

Experimental information retrieval (IR) systems, some dating back to the sixties, have demonstrated the viability of fully automatic document storage and retrieval methodologies with small to medium size bibliographic collections [72]. Many of these experimental systems utilize the vector space model in which each important term (such as a word stem) identifies a different dimension in a space, so that matrix methods and vector operations can be defined on queries and documents. Statistical techniques have been very effective, and probabilistic enhancements have given additional improvements [84]. However, the basic vector space model is oriented towards recording the essential information in the text of a title/abstract combination rather than describing more complex document structures. It is necessary to extend the model in order to handle composite documents.On the other hand, commonly available retrieval systems that employ Boolean logic queries and utilize inverted file storage schemes can without modification accommodate such documents, albeit with somewhat less effectiveness than is possible with more sophisticated systems. Hence, it is also of interest to consider how Boolean logic systems can be extended to give better performance, especially with composite documents, and to integrate those approaches with vector methods.