Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The many faces of publish/subscribe
ACM Computing Surveys (CSUR)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BitDew: a programmable environment for large-scale data management and distribution
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Extending the EGEE Grid with XtremWeb-HEP Desktop Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
iRODS Primer: integrated Rule-Oriented Data System
iRODS Primer: integrated Rule-Oriented Data System
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Swift: A language for distributed parallel scripting
Parallel Computing
Massively-parallel stream processing under QoS constraints with Nephele
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Hi-index | 0.00 |
Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.