Some observations on retrieval from a large technical document database

  • Authors:
  • R Marcus

  • Affiliations:
  • -

  • Venue:
  • ACM SIGIR Forum
  • Year:
  • 1986

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of retrieving documents from a large document database is an important problem that has been the subject of some controversy in the recent literature. See [1] and [2]. The basic question has been whether manual indexing is necessary to overcome the weaknesses of full-text retrieval. The problem with full-text retrieval is that in order to narrow down the number of documents selected to a reasonable size, it is necessary to choose a specialized set of keywords. This restrictive search means that relevant documents will often be missed due to misspellings, alternate terms, typing errors, etc.Manual indexing would seem to provide a solution to this problem by allowing a document to be retrieved using a human assigned index term or terms describing its content. Unfortunately, experiments have shown that manual indexing is often subject to the same problems as full-text retrieval due to the variation in index terms chosen by different people for the same document. An even more serious problem for a very large technical domain with many documents is the difficulty in creating and maintaining coherent indexes that can describe the whole domain.This paper will summarize the experience that has been gained from studying document retrieval in the domain of computer software support. The database is accessed by engineers searching to see if a current problem has occurred and been solved in the past. An extensive study of how documents are submitted and retrieved has brought to light some general principles that can improve the performance of the system. These principles which combine manual indexing and full-text retrieval will be discussed in the remainder of this paper.