Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
IEEE Internet Computing
Analyzing peer-to-peer traffic across large networks
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
The chatty web: emergent semantics through gossiping
WWW '03 Proceedings of the 12th international conference on World Wide Web
Peer-to-peer information retrieval using self-organizing semantic overlay networks
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Hierarchical dwarfs for the rollup cube
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Enhancing P2P file-sharing with an internet-scale query processor
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Hi-index | 0.00 |
In this paper we describe HIS , a system that enables efficient storage and querying of data organized into concept hierarchies and dispersed over a network. Our scheme utilizes an adaptive algorithm that automatically adjusts the level of indexing according to the granularity of the incoming queries, without assuming any prior knowledge of the query workload. Efficient roll-up and drill-down operations increase the exact-match query ratio by shifting to the most favorable hierarchy level. Combined with soft-state indices created after query misses, our system achieves maximization of performance by minimizing query flooding. Extensive experimental evaluations show that, on top of the advantages that a distributed storage offers, our method answers the large majority of incoming queries without flooding the network and at the same time it manages to preserve the hierarchical nature of data. It shows remarkable performance especially for skewed workloads, which are frequently documented in the majority of Internet-scale applications. These characteristics are maintained even after sudden shifts in the workload.