Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Infopipes for composing distributed information flows
M3W Proceedings of the 2001 international workshop on Multimedia middleware
Adaptive load shedding for windowed stream joins
Proceedings of the 14th ACM international conference on Information and knowledge management
Run-time operator state spilling for memory intensive long-running queries
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
Maximizing the output rate of multi-way join queries over streaming information sources
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Load shedding in a data stream manager
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Programming Abstractions for Data Intensive Computing on Clouds and Grids
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
A stratified approach for supporting high throughput event processing applications
Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
Hadoop: The Definitive Guide
Event Processing in Action
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Performance Considerations of Data Acquisition in Hadoop System
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Performance Analysis of Hadoop for Query Processing
WAINA '11 Proceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Collaborative Financial Infrastructure Protection: Tools, Abstractions, and Middleware
Collaborative Financial Infrastructure Protection: Tools, Abstractions, and Middleware
Adaptive online scheduling in storm
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
Applications based on event processing are often designed to continuously evaluate set of events defined by sliding time windows. Solutions employing long-running continuous queries executed in-memory show their limits in applications characterized by a staggering growth of available sources that continuously produce new events at high rates (e.g. intrusion detection systems and algorithmic trading). Problems arise due to the complexities in maintaining large amounts of events in memory for continuous elaboration, and due to the difficulties in managing at run-time the network of elaborating nodes. A batch approach to this kind of computation provides a viable solution for scenarios characterized by non frequent computations of very large time windows. In this paper we propose a model for batch processing in time window event computations that allows the definition of multiple metrics for performance optimization. These metrics specifically take into account the organization of input data to minimize its impact on computation latency. The model is then instantiated on Hadoop, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.