Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Data Management: NetCDF: an Interface for Scientific Data Access
IEEE Computer Graphics and Applications
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
The Statistical Properties of Hoast Load
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Canon in G Major: Designing DHTs with Hierarchical Structure
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Brief announcement: prefix hash tree
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Mercury: supporting scalable multi-attribute range queries
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Workload-Aware Load Balancing for Clustered Web Servers
IEEE Transactions on Parallel and Distributed Systems
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
A demonstration of SciDB: a science-oriented DBMS
Proceedings of the VLDB Endowment
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
Overview of sciDB: large scale array storage, processing and analysis
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Replication, load balancing and efficient range query processing in DHTs
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Galileo: A Framework for Distributed Storage of High-Throughput Data Streams
UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
Expressive Query Support for Multidimensional Data in Distributed Hash Tables
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Future Generation Computer Systems
Polygon-Based Query Evaluation over Geospatial Data Using Distributed Hash Tables
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
The proliferation of observational devices and sensors with networking capabilities has led to growth in both the rates and sources of data that ultimately contribute to extreme-scale data volumes. Datasets generated in such settings are often multidimensional, with each dimension accounting for a feature of interest. We posit that efficient evaluation of queries over such datasets must account for both the distribution of data values and the patterns in the queries themselves. Configuring query evaluation by hand is infeasible given the data volumes, dimensionality, and the rates at which new data and queries arrive. In this paper, we describe our algorithm to autonomously improve query evaluations over voluminous, distributed datasets. Our approach autonomously tunes for the most dominant query patterns and distribution of values across a dimension. We evaluate our algorithm in the context of our system, Galileo, which is a hierarchical distributed hash table used for managing geospatial, time-series data. Our system strikes a balance between memory utilization, fast evaluations, and search space reductions. Empirical evaluations reported here are performed on a dataset that is multidimensional and comprises a billion files. The schemes described in this work are broadly applicable to any system that leverages distributed hash tables as a storage mechanism.