Query workload-aware overlay construction using histograms

Authors:
Georgia Koloniari;Yannis Petrakis;Evaggelia Pitoura;Thodoris Tsotsos
Affiliations:
University of Ioannina, Greece;University of Ioannina, Greece;University of Ioannina, Greece;University of Ioannina, Greece
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 17
Cited 5

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Epidemic algorithms for replicated database maintenance

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Flexible Information Discovery in Decentralized Distributed Systems

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Routing Indices For Peer-to-Peer Systems

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
SETS: search enhanced by topic segmentation

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
P-Grid: a self-organizing structured P2P system

ACM SIGMOD Record
A Peer-to-peer Framework for Caching Range Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Mercury: supporting scalable multi-attribute range queries

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
One torus to rule them all: multi-dimensional queries in P2P systems

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Symphony: distributed hashing in a small world

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
On using histograms as routing indexes in peer-to-peer systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing

Autonomous Querying for Knowledge Networks

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Autonomic and cognitive possibilities for information or neural-like systems using dynamic links

WSEAS TRANSACTIONS on SYSTEMS
Autonomic and cognitive possibilities for information or neural-like systems using dynamic links

WSEAS TRANSACTIONS on SYSTEMS
Learning to tag

Proceedings of the 18th international conference on World wide web
Knowledge-based reasoning through stigmergic linking

IWSOS'07 Proceedings of the Second international conference on Self-Organizing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer-to-peer(p2p) systems over an efficient means of data sharing among a dynamically changing set of a large number of a tonomous nodes.Each node in a p2p system is connected with a small number of other nodes thus creating an overlay network of nodes. A query posed at a node is routed through the overlay network towards nodes hosting data items that satisfy it. In this paper, we consider building overlays that exploit the query workload so that nodes are clustered based on their results to a given query workload. The motivation is to create overlays where nodes that match a large number of similar queries are a fewlinks apart. Query frequency is also taken into account so that popular queries have a greater effect on the formation of the overlay than unpopular ones. We focus on range selection queries and se histograms to estimate the query results of each node. Then, nodes are clustered based on the similarity of their histograms. To this end,we introd ce a workload-aware edit distance metric between histograms that takes into account the query workload. Our experimental results show that workload-aware overlays increase the percentage of query results returned for a given number of nodes visited as compared to both random (i.e., unclustered)overlays and non workload-aware clustered overlays (i.e., overlays that cluster nodes based solely on the nodes' content).