Continuous analytics over discontinuous streams

Authors:
Sailesh Krishnamurthy;Michael J. Franklin;Jeffrey Davis;Daniel Farina;Pasha Golovko;Alan Li;Neil Thombre
Affiliations:
Truviso, Inc., Foster City, CA, USA;Truviso, Inc., Foster City, CA, USA;Truviso, Inc., Foster City, CA, USA;Truviso, Inc., Foster City, CA, USA;Truviso, Inc., Foster City, CA, USA;Truviso, Inc., Foster City, CA, USA;Truviso, Inc., Foster City, CA, USA
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 10
Cited 15

ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

ACM Transactions on Database Systems (TODS)
Parallel database systems: the future of high performance database systems

Communications of the ACM
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
FAD, a Powerful and Simple Database Language

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Highly available, fault-tolerant, parallel dataflows

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Flexible time management in data stream systems

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fault-tolerance in the Borealis distributed stream processing system

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Out-of-order processing: a new architecture for high-performance stream systems

Proceedings of the VLDB Endowment

How soccer players would do stream joins

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Bistro data feed management system

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Update propagation in a streaming warehouse

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Complex event processing with T-REX

Journal of Systems and Software
Revisiting formal ordering in data stream querying

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Transactional stream processing

Proceedings of the 15th International Conference on Extending Database Technology
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Towards benchmarking stream data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Predictive analytics with surveillance big data

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Fast data in the era of big data: Twitter's real-time related query suggestion architecture

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Data stream warehousing

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Continuous query processing with concurrency control: reading updatable resources consistently

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Data stream processing with concurrency control

ACM SIGAPP Applied Computing Review
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Continuous analytics systems that enable query processing over steams of data have emerged as key solutions for dealing with massive data volumes and demands for low latency. These systems have been heavily influenced by an assumption that data streams can be viewed as sequences of data that arrived more or less in order. The reality, however, is that streams are not often so well behaved and disruptions of various sorts are endemic. We argue, therefore, that stream processing needs a fundamental rethink and advocate a unified approach toward continuous analytics over discontinuous streaming data. Our approach is based on a simple insight - using techniques inspired by data parallel query processing, queries can be performed over independent sub-streams with arbitrary time ranges in parallel, generating partial results. The consolidation of the partial results over each sub-stream can then be deferred to the time at which the results are actually used on an on-demand basis. In this paper, we describe how the Truviso Continuous Analytics system implements this type of order-independent processing. Not only does the approach provide the first real solution to the problem of processing streaming data that arrives arbitrarily late, it also serves as a critical building block for solutions to a host of hard problems such as parallelism, recovery, transactional consistency, high availability, failover, and replication.