Efficient and effective metasearch for a large number of text databases

Authors:
Clement Yu;Weiyi Meng;King-Lup Liu;Wensheng Wu;Naphtali Rishe
Affiliations:
Dept. of EECS, University of Illinois at Chicago, Chicago, IL;Dept. of Computer Science, SUNY - Binghamton, Binghamton, NY;Dept. of EECS, University of Illinois at Chicago, Chicago, IL;Dept. of EECS, University of Illinois at Chicago, Chicago, IL;School of Computer Science, Florida International University, Miami, FL
Venue:
Proceedings of the eighth international conference on Information and knowledge management
Year:
1999

Citing 13
Cited 23

ALIWEB—Archie-like indexing in the WEB

Selected papers of the first conference on World-Wide Web
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model for distributed information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of database query processing for advanced applications

Principles of database query processing for advanced applications
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Finding the Most Similar Documents across Multiple Text Databases

ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Estimating the Usefulness of Search Engines

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
The search broker

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

Towards a highly-scalable and effective metasearch engine

Proceedings of the 10th international conference on World Wide Web
Efficient and effective metasearch for text databases incorporating linkages among documents

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A highly scalable and effective method for metasearch

ACM Transactions on Information Systems (TOIS)
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness

Proceedings of the tenth international conference on Information and knowledge management
Discovering the representative of a search engine

Proceedings of the tenth international conference on Information and knowledge management
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
A survey in indexing and searching XML documents

Journal of the American Society for Information Science and Technology - XML
Expert agreement and content based reranking in a meta search environment using Mearf

Proceedings of the 11th international conference on World Wide Web
Intelligent knowledge discovery in peer-to-peer file sharing

Proceedings of the eleventh international conference on Information and knowledge management
Discovering the representative of a search engine

Proceedings of the eleventh international conference on Information and knowledge management
Exploiting Manual Indexing to Improve Collection Selection and Retrieval Effectiveness

Information Retrieval
A Methodology to Retrieve Text Documents from Multiple Databases

IEEE Transactions on Knowledge and Data Engineering
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
Shadow document methods of resutls merging

Proceedings of the 2004 ACM symposium on Applied computing
When one sample is not enough: improving text database selection using shrinkage

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Supporting metasearch with XSL

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
MINERVA: collaborative P2P search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Two-stage statistical language models for text database selection

Information Retrieval
AllInOneNews: development and evaluation of a large-scale news metasearch engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Classification-aware hidden-web text database selection

ACM Transactions on Information Systems (TOIS)
Federated Search

Foundations and Trends in Information Retrieval
Towards distributed information retrieval in the semantic web: query reformulation using the oMAP framework

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Evaluation of result merging strategies for metasearch engines

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metasearch engines can be used to facilitate ordinary users for retrieving information from multiple local sources (text databases). In a metasearch engine, the contents of each local database is represented by a representative. Each user query is evaluated against the set of representatives of all databases in order to determine the appropriate databases to search. When the number of databases is very large, say in the order of tens of thousands or more, then a traditional metasearch engine may become inefficient as each query needs to be evaluated against too many database representatives. Furthermore, the storage requirement on the site containing the metasearch engine can be very large. In this paper, we propose to use a hierarchy of database representatives to improve the efficiency. We provide an algorithm to search the hierarchy. We show that the retrieval effectiveness of our algorithm is the same as that of evaluating the user query against all database representatives. We also show that our algorithm is efficient. In addition, we propose an alternative way of allocating representatives to sites so that the storage burden on the site containing the metasearch engine is much reduced.