Discovering the representative of a search engine

Authors:
King-Lup Liu;Clement Yu;Weiyi Meng
Affiliations:
DePaul University, Chicago, IL;University of Illinois at Chicago, Chicago, IL;SUNY-Binghamton, Binghamton, NY
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 9
Cited 11

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and effective metasearch for a large number of text databases

Proceedings of the eighth international conference on Information and knowledge management
Efficient and effective metasearch for text databases incorporating linkages among documents

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Statistical Method for Estimating the Usefulness of Text Databases

IEEE Transactions on Knowledge and Data Engineering
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding the Most Similar Documents across Multiple Text Databases

ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries

Capturing collection size for distributed non-cooperative retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating corpus size via queries

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating collection size with logistic regression

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Mining world knowledge for analysis of search engine content

Web Intelligence and Agent Systems
Estimating deep web data source size by capture---recapture method

Information Retrieval
Improving the evaluation of web search systems

ECIR'03 Proceedings of the 25th European conference on IR research
Unbiased estimation of size and other aggregates over hidden web databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ranking bias in deep web size estimation using capture recapture method

Data & Knowledge Engineering
Federated Search

Foundations and Trends in Information Retrieval
Attribute domain discovery for hidden web databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Rank discovery from web databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon receiving a user query, the metasearch engine sends it to those underlying search engines which are likely to return the desired documents for the query. The selection algorithm used by a metasearch engine to determine whether a search engine should be sent the query typically makes the decision based on the search-engine representative, which contains characteristic information about the database of a search engine. However, an underlying search engine may not be willing to provide the needed information to the metasearch engine. This paper shows that the needed information can be estimated from an uncooperative search engine with good accuracy. Two pieces of information which permit accurate search engine selection are the number of documents indexed by the search engine and the maximum weight of each term. In this paper, we present techniques for the estimation of these two pieces of information.