Finding the Most Similar Documents across Multiple Text Databases

  • Authors:
  • Clement Yu;King-Lup Liu;Wensheng Wu;Weiyi Meng;Naphtali Rishe

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies is presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.