ACM Transactions on Database Systems (TODS)
Parallel database systems: the future of high performance database systems
Communications of the ACM
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
FAD, a Powerful and Simple Database Language
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Flexible time management in data stream systems
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Out-of-order processing: a new architecture for high-performance stream systems
Proceedings of the VLDB Endowment
How soccer players would do stream joins
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Bistro data feed management system
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Update propagation in a streaming warehouse
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Complex event processing with T-REX
Journal of Systems and Software
Revisiting formal ordering in data stream querying
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Transactional stream processing
Proceedings of the 15th International Conference on Extending Database Technology
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Towards benchmarking stream data warehouses
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Predictive analytics with surveillance big data
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Fast data in the era of big data: Twitter's real-time related query suggestion architecture
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Continuous query processing with concurrency control: reading updatable resources consistently
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Data stream processing with concurrency control
ACM SIGAPP Applied Computing Review
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Continuous analytics systems that enable query processing over steams of data have emerged as key solutions for dealing with massive data volumes and demands for low latency. These systems have been heavily influenced by an assumption that data streams can be viewed as sequences of data that arrived more or less in order. The reality, however, is that streams are not often so well behaved and disruptions of various sorts are endemic. We argue, therefore, that stream processing needs a fundamental rethink and advocate a unified approach toward continuous analytics over discontinuous streaming data. Our approach is based on a simple insight - using techniques inspired by data parallel query processing, queries can be performed over independent sub-streams with arbitrary time ranges in parallel, generating partial results. The consolidation of the partial results over each sub-stream can then be deferred to the time at which the results are actually used on an on-demand basis. In this paper, we describe how the Truviso Continuous Analytics system implements this type of order-independent processing. Not only does the approach provide the first real solution to the problem of processing streaming data that arrives arbitrarily late, it also serves as a critical building block for solutions to a host of hard problems such as parallelism, recovery, transactional consistency, high availability, failover, and replication.