Efficient and effective metasearch for a large number of text databases
Proceedings of the eighth international conference on Information and knowledge management
Towards a highly-scalable and effective metasearch engine
Proceedings of the 10th international conference on World Wide Web
Efficient and effective metasearch for text databases incorporating linkages among documents
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A highly scalable and effective method for metasearch
ACM Transactions on Information Systems (TOIS)
Discovering the representative of a search engine
Proceedings of the tenth international conference on Information and knowledge management
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Discovering the representative of a search engine
Proceedings of the eleventh international conference on Information and knowledge management
A Methodology to Retrieve Text Documents from Multiple Databases
IEEE Transactions on Knowledge and Data Engineering
A Statistical Method for Estimating the Usefulness of Text Databases
IEEE Transactions on Knowledge and Data Engineering
Distributed mining of classification rules
Knowledge and Information Systems
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
Using word clusters to detect similar web documents
KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Hi-index | 0.00 |
In this paper, we present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies is presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.