Efficient and effective metasearch for a large number of text databases

  • Authors:
  • Clement Yu;Weiyi Meng;King-Lup Liu;Wensheng Wu;Naphtali Rishe

  • Affiliations:
  • Dept. of EECS, University of Illinois at Chicago, Chicago, IL;Dept. of Computer Science, SUNY - Binghamton, Binghamton, NY;Dept. of EECS, University of Illinois at Chicago, Chicago, IL;Dept. of EECS, University of Illinois at Chicago, Chicago, IL;School of Computer Science, Florida International University, Miami, FL

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Metasearch engines can be used to facilitate ordinary users for retrieving information from multiple local sources (text databases). In a metasearch engine, the contents of each local database is represented by a representative. Each user query is evaluated against the set of representatives of all databases in order to determine the appropriate databases to search. When the number of databases is very large, say in the order of tens of thousands or more, then a traditional metasearch engine may become inefficient as each query needs to be evaluated against too many database representatives. Furthermore, the storage requirement on the site containing the metasearch engine can be very large. In this paper, we propose to use a hierarchy of database representatives to improve the efficiency. We provide an algorithm to search the hierarchy. We show that the retrieval effectiveness of our algorithm is the same as that of evaluating the user query against all database representatives. We also show that our algorithm is efficient. In addition, we propose an alternative way of allocating representatives to sites so that the storage burden on the site containing the metasearch engine is much reduced.