Exploiting geospatial and chronological characteristics in data streams to enable efficient storage and retrievals

Authors:
Matthew Malensek;Sangmi Lee Pallickara;Shrideep Pallickara
Affiliations:
-;-;-
Venue:
Future Generation Computer Systems
Year:
2013

Citing 23
Cited 3

Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Data Management: NetCDF: an Interface for Scientific Data Access

IEEE Computer Graphics and Applications
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
An Overview of the Granules Runtime for Cloud Computing

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Efficient access to many samall files in a filesystem for grid computing

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
A demonstration of SciDB: a science-oriented DBMS

Proceedings of the VLDB Endowment
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
Overview of sciDB: large scale array storage, processing and analysis

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Analyzing Electroencephalograms Using Cloud Computing Techniques

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Voronoi-Based Geospatial Query Processing with MapReduce

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Galileo: A Framework for Distributed Storage of High-Throughput Data Streams

UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
Adaptive heterogeneous language support within a cloud runtime

Future Generation Computer Systems
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Autonomous, failure-resilient orchestration of distributed discrete event simulations

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Autonomously improving query evaluations over multidimensional data in distributed hash tables

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Polygon-Based Query Evaluation over Geospatial Data Using Distributed Hash Tables

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the design of a high-throughput storage system, Galileo, for data streams generated in observational settings. To cope with data volumes, the shared nothing architecture in Galileo supports incremental assimilation of nodes, while accounting for heterogeneity in their capabilities. To achieve efficient storage and retrievals of data, Galileo accounts for the geospatial and chronological characteristics of such time-series observational data streams. Our benchmarks demonstrate that Galileo supports high-throughput storage and efficient retrievals of specific portions of large datasets while supporting different types of queries.