Guiding queries to information sources with InfoBeacons

Authors:
Brian F. Cooper
Affiliations:
Georgia Institute of Technology
Venue:
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Year:
2004

Citing 26
Cited 9

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Partial collection replication versus caching for information retrieval systems

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Query routing for Web search engines: architectures and experiments

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Search and replication in unstructured peer-to-peer networks

ICS '02 Proceedings of the 16th international conference on Supercomputing
Modern Information Retrieval

Modern Information Retrieval
A local search mechanism for peer-to-peer networks

Proceedings of the eleventh international conference on Information and knowledge management
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks

WWW '03 Proceedings of the 12th international conference on World Wide Web
Piazza: data management infrastructure for semantic web applications

WWW '03 Proceedings of the 12th international conference on World Wide Web
Make it fresh, make it quick: searching a network of personal webservers

WWW '03 Proceedings of the 12th international conference on World Wide Web
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Making gnutella-like P2P systems scalable

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Locating data sources in large distributed systems

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient peer-to-peer keyword searching

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
World Wide Web caching: trends and techniques

IEEE Communications Magazine

pFusion: A P2P Architecture for Internet-Scale Content-Based Search and Retrieval

IEEE Transactions on Parallel and Distributed Systems
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Using taxonomies for content-based routing with ants

Computer Networks: The International Journal of Computer and Telecommunications Networking
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques

IEEE Transactions on Parallel and Distributed Systems
An advertisement-based peer-to-peer search algorithm

Journal of Parallel and Distributed Computing
PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing

Computer Networks: The International Journal of Computer and Telecommunications Networking
Searching dynamic communities with personal indexes

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Using information retrieval techniques to route queries in an infobeacons network

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Community based ranking in peer-to-peer networks

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Internet provides a wealth of useful information in a vast number of dynamic information sources, but it is difficult to determine which sources are useful for a given query. Most existing techniques either require explicit source cooperation (for example, by exporting data summaries), or build a relatively static source characterization (for example, by assigning a topic to the source). We present a system, called InfoBeacons, that takes a different approach: data and sources are left "as is," and a peer-to-peer network of beacons uses past query results to "guide" queries to sources, who do the actual query processing. This approach has several advantages, including requiring minimal changes to sources, tolerance of dynamism and heterogeneity, and the ability to scale to large numbers of sources. We present the architecture of the system, and discuss the advantages of our design. We then focus on how a beacon can choose good sources for a query despite the loose coupling of beacons to sources. Beacons cache responses to previous queries and adapt the cache to changes at the source. The cache is then used to select good sources for future queries. We discuss results from a detailed experimental study using our beacon prototype which demonstrates that our "loosely coupled" approach is effective; a beacon only has to contact sixty percent or less of the sources contacted by existing, tightly coupled approaches, while providing results of equivalent or better relevance to queries.