Stepping stones and pathways: improving retrieval by chains of relationships between documents

  • Authors:
  • Edward A. Fox;Fernando Adrian Das Neves

  • Affiliations:
  • Virginia Polytechnic Institute and State University;Virginia Polytechnic Institute and State University

  • Venue:
  • Stepping stones and pathways: improving retrieval by chains of relationships between documents
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The information retrieval (IR) field has been successful in developing techniques to address many types of information needs. However, there are cases in which traditional approaches to IR are not able to produce adequate results. Examples include: when a small set of (2--3) documents is needed as an answer rather than a single document, or when "query splitting" is required to satisfactorily explore the document space. We explore an alternative model of building and presenting retrieval results for such cases. In particular, we research effective methods for handling information needs that may: (1) Include multiple topics: A typical query is interpreted by current IR systems as a request to retrieve documents that each discusses all topics included in that query. We propose an alternative interpretation based on query splitting. It allows queries to be interpreted as requests to retrieve sets of documents rather than individual documents, with meaningful relationships among the members of each such set. (2) Be interpreted as parts in a chain of relationships: Suppose a query concerns topics tl and tm. Is there a relation between topics t l and tm that involves t2 and possibly other topics as in {t1, t2, ··· tm}? Thus, we propose an alternative interpretation of user queries and presentation of the results. Our interpretation has the potential to improve retrieval results whenever there is a mismatch between the user's understanding of the collection and the actual collection content. We define and refine a retrieval scheme that enhances retrieval through a framework that combines multiple sources of evidence. Query results in our interpretation are networks of document groups representing topics, each group relating to and connecting to other groups in the network that partially answer the user's information need. We devise new and more effective representations and techniques to visualize results, and incorporate the user as part of the retrieval process. We also evaluate the improvement of the query results based on multiple measures. In particular, we verify the validity of our approach through a study involving a collection of Operating Systems research papers that was specially built for this dissertation.