A Statistical Method for Estimating the Usefulness of Text Databases

Authors:
King-Lup Liu;Clement Yu;Weiyi Meng;Wensheng Wu;Naphtali Rishe
Affiliations:
-;-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 22
Cited 13

ALIWEB—Archie-like indexing in the WEB

Selected papers of the first conference on World-Wide Web
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A probabilistic model for distributed information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of database query processing for advanced applications

Principles of database query processing for advanced applications
Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A clustered search algorithm incorporating arbitrary term dependencies

ACM Transactions on Database Systems (TODS)
On the estimation of the number of desired records with respect to a given query

ACM Transactions on Database Systems (TODS)
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Finding the Most Similar Documents across Multiple Text Databases

ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Estimating the Usefulness of Search Engines

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
Characterizing World Wide Web Queries

Characterizing World Wide Web Queries
The search broker

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
SIFT: a tool for wide-area information dissemination

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings

Discovery of similarity computations of search engines

Proceedings of the ninth international conference on Information and knowledge management
Towards a highly-scalable and effective metasearch engine

Proceedings of the 10th international conference on World Wide Web
Efficient and effective metasearch for text databases incorporating linkages among documents

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database selection for processing k nearest neighbors queries in distributed environments

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A highly scalable and effective method for metasearch

ACM Transactions on Information Systems (TOIS)
Discovering the representative of a search engine

Proceedings of the tenth international conference on Information and knowledge management
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
Discovering the representative of a search engine

Proceedings of the eleventh international conference on Information and knowledge management
A Methodology to Retrieve Text Documents from Multiple Databases

IEEE Transactions on Knowledge and Data Engineering
Information Retrieval with Distributed Databases: Analytic Models of Performance

IEEE Transactions on Parallel and Distributed Systems
Distributed information retrieval: a multi-objective resource selection approach

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - Intelligent information systems
Aggregation of web search engines based on users' preferences in WebFusion

Knowledge-Based Systems
AllInOneNews: development and evaluation of a large-scale news metasearch engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching desired data on the Internet is one of the most common ways the Internet is used. No single search engine is capable of searching all data on the Internet. The approach that provides an interface for invoking multiple search engines for each user query has the potential to satisfy more users. When the number of search engines under the interface is large, invoking all search engines for each query is often not cost effective because it creates unnecessary network traffic by sending the query to a large number of useless search engines and searching these useless search engines wastes local resources. The problem can be overcome if the usefulness of every search engine with respect to each query can be predicted. In this paper, we present a statistical method to estimate the usefulness of a search engine for any given query. For a given query, the usefulness of a search engine in this paper is defined to be a combination of the number of documents in the search engine that are sufficiently similar to the query and the average similarity of these documents. Experimental results indicate that our estimation method is much more accurate than existing methods.