Server selection methods in hybrid portal search

Authors:
David Hawking;Paul Thomas
Affiliations:
CSIRO ICT Centre, Canberra, Australia;Australian National University
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 21
Cited 28

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Analysis of a very large web search engine query log

ACM SIGIR Forum
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
A case study in web search using TREC algorithms

Proceedings of the 10th international conference on World Wide Web
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Approaches to collection selection and results merging for distributed information retrieval

Proceedings of the tenth international conference on Information and knowledge management
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
A taxonomy of web search

ACM SIGIR Forum
Automated discovery of search interfaces on the web

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
The perfect search engine is not enough: a study of orienteering behavior in directed search

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
When one sample is not enough: improving text database selection using shrinkage

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Toward better weighting of anchors

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Performance and cost tradeoffs in Web search

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Optimizing result prefetching in web search engines with segmented indices

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Capturing collection size for distributed non-cooperative retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed query sampling: a quality-conscious approach

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed text retrieval from overlapping collections

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Federated text retrieval from uncooperative overlapped collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Classification-aware hidden-web text database selection

ACM Transactions on Information Systems (TOIS)
Mining world knowledge for analysis of search engine content

Web Intelligence and Agent Systems
Integral based source selection for uncooperative distributed information retrieval environments

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Simple Adaptations of Data Fusion Algorithms for Source Selection

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
SUSHI: scoring scaled samples for server selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effective query expansion for federated search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
A case for probabilistic logic for scalable patent retrieval

Proceedings of the 2nd international workshop on Patent information retrieval
Effectiveness of Aggregation Methods in Blog Distillation

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
Modeling information sources as integrals for effective and efficient source selection

Information Processing and Management: an International Journal
Federated Search

Foundations and Trends in Information Retrieval
A multi-collection latent topic model for federated search

Information Retrieval
Logic-Based retrieval: technology for content-oriented and analytical querying of patent data

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Linguistic aggregation methods in blog retrieval

Information Processing and Management: an International Journal
Using anchor text for homepage and topic distillation search tasks

Journal of the American Society for Information Science and Technology
Employing document dependency in blog search

Journal of the American Society for Information Science and Technology
Diversity in blog feed retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Federated search in the wild: the combined power of over a hundred search engines

Proceedings of the 21st ACM international conference on Information and knowledge management
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)
Which vertical search engines are relevant?

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The TREC.GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with judged answers. It can usefully model aspects of government and large corporate portals. Analysis of the.gov data shows that a purely distributed approach would not be feasible for providing search on a.gov portal because of the large number (17,000+) of web sites and the high proportion that do not provide a search interface. An alternative hybrid approach, combining both distributed and centralized techniques, is proposed and server selection methods are evaluated within this framework using web-oriented evaluation methodology. A number of well-known algorithms are compared against representatives (highest anchor ranked page (HARP) and anchor weighted sum (AWSUM)) of a family of new selection methods which use link anchortext extracted from an auxiliary crawl to provide descriptions of sites which are not themselves crawled. Of the previously published methods, ReDDE substantially outperformed three variants of CORI and also outperformed a method based on Kullback-Leibler Divergence (extended) except on topic distillation. HARP and AWSUM performed best overall but were outperformed on the topic distillation task by extended KL Divergence.