Information retrieval using a singular value decomposition model of latent semantic structure
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Combining the evidence of multiple query representations for information retrieval
TREC-2 Proceedings of the second conference on Text retrieval conference
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiences with selecting search engines using metasearch
ACM Transactions on Information Systems (TOIS)
A probabilistic model for distributed information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of database query processing for advanced applications
Principles of database query processing for advanced applications
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving two-stage ad-hoc retrieval for short queries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Infoseek's experiences searching the internet
ACM SIGIR Forum
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic solution to the selection and fusion problem in distributed information retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR
ACM Transactions on Information Systems (TOIS)
A corpus analysis approach for automatic query expansion and its extension to multiple databases
ACM Transactions on Information Systems (TOIS)
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Efficient and effective metasearch for a large number of text databases
Proceedings of the eighth international conference on Information and knowledge management
Accessibility of information on the Web
intelligence
Information Retrieval Systems: Theory and Implementation
Information Retrieval Systems: Theory and Implementation
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Statistical Method for Estimating the Usefulness of Text Databases
IEEE Transactions on Knowledge and Data Engineering
Merging Ranks from Heterogeneous Internet Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Finding the Most Similar Documents across Multiple Text Databases
ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Estimating the Usefulness of Search Engines
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
Towards a highly-scalable and effective metasearch engine
Proceedings of the 10th international conference on World Wide Web
Efficient and effective metasearch for text databases incorporating linkages among documents
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database selection for processing k nearest neighbors queries in distributed environments
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A highly scalable and effective method for metasearch
ACM Transactions on Information Systems (TOIS)
Iconic pictorial retrieval using multiple attributes and spatial relationships
Knowledge-Based Systems
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
AllInOneNews: development and evaluation of a large-scale news metasearch engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A novel storage embedded application
CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
Future Generation Computer Systems
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Foundations and Trends in Information Retrieval
Location-based context retrieval and filtering
LoCA'06 Proceedings of the Second international conference on Location- and Context-Awareness
Hi-index | 0.00 |
This paper presents a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, the contents of databases are indicated approximately by database representatives. Databases are ranked using their representatives with respect to the given query. We provide a necessary and sufficient condition to rank the databases optimally. In order to satisfy this condition, we provide three estimation methods. One estimation method is intended for short queries; the other two are for all queries. Second, we provide an algorithm, OptDocRetrv, to retrieve documents from the databases according to their rank and in a particular way. We show that if the databases containing the n most similar documents for a given query are ranked ahead of other databases, our methodology will guarantee the retrieval of the n most similar documents for the query. When the number of databases is large, we propose to organize database representatives into a hierarchy and employ a best-search algorithm to search the hierarchy. It is shown that the effectiveness of the best-search algorithm is the same as that of evaluating the user query against all database representatives.