A Methodology to Retrieve Text Documents from Multiple Databases

Authors:
Clement Yu;King-Lup Liu;Weiyi Meng;Zonghuan Wu;Naphtali Rishe
Affiliations:
-;-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 32
Cited 12

Information retrieval using a singular value decomposition model of latent semantic structure

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiences with selecting search engines using metasearch

ACM Transactions on Information Systems (TOIS)
A probabilistic model for distributed information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of database query processing for advanced applications

Principles of database query processing for advanced applications
Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving two-stage ad-hoc retrieval for short queries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Infoseek's experiences searching the internet

ACM SIGIR Forum
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic solution to the selection and fusion problem in distributed information retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
A corpus analysis approach for automatic query expansion and its extension to multiple databases

ACM Transactions on Information Systems (TOIS)
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Efficient and effective metasearch for a large number of text databases

Proceedings of the eighth international conference on Information and knowledge management
Accessibility of information on the Web

intelligence
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Statistical Method for Estimating the Usefulness of Text Databases

IEEE Transactions on Knowledge and Data Engineering
Merging Ranks from Heterogeneous Internet Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Finding the Most Similar Documents across Multiple Text Databases

ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Estimating the Usefulness of Search Engines

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

Towards a highly-scalable and effective metasearch engine

Proceedings of the 10th international conference on World Wide Web
Efficient and effective metasearch for text databases incorporating linkages among documents

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database selection for processing k nearest neighbors queries in distributed environments

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A highly scalable and effective method for metasearch

ACM Transactions on Information Systems (TOIS)
Iconic pictorial retrieval using multiple attributes and spatial relationships

Knowledge-Based Systems
An adaptive crawler for locating hidden-Web entry points

Proceedings of the 16th international conference on World Wide Web
AllInOneNews: development and evaluation of a large-scale news metasearch engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A novel storage embedded application

CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
Ontology-based content organization and retrieval for SCORM-compliant teaching materials in data grids

Future Generation Computer Systems
Modeling search response time

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Federated Search

Foundations and Trends in Information Retrieval
Location-based context retrieval and filtering

LoCA'06 Proceedings of the Second international conference on Location- and Context-Awareness

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, the contents of databases are indicated approximately by database representatives. Databases are ranked using their representatives with respect to the given query. We provide a necessary and sufficient condition to rank the databases optimally. In order to satisfy this condition, we provide three estimation methods. One estimation method is intended for short queries; the other two are for all queries. Second, we provide an algorithm, OptDocRetrv, to retrieve documents from the databases according to their rank and in a particular way. We show that if the databases containing the n most similar documents for a given query are ranked ahead of other databases, our methodology will guarantee the retrieval of the n most similar documents for the query. When the number of databases is large, we propose to organize database representatives into a hierarchy and employ a best-search algorithm to search the hierarchy. It is shown that the effectiveness of the best-search algorithm is the same as that of evaluating the user query against all database representatives.