MapReduce-Based data stream processing over large history data

Authors:
Kaiyuan Qi;Zhuofeng Zhao;Jun Fang;Yanbo Han
Affiliations:
Cloud Computing Research Center, North China University of Technology, Beijing, China,Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Cloud Computing Research Center, North China University of Technology, Beijing, China;Cloud Computing Research Center, North China University of Technology, Beijing, China;Cloud Computing Research Center, North China University of Technology, Beijing, China
Venue:
ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Year:
2012

Citing 8
Cited 0

Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the development of Internet of Things applications based on sensor data, how to process high speed data stream over large scale history data brings a new challenge. This paper proposes a new programming model RTMR, which improves the real-time capability of traditional batch processing based MapReduce by preprocessing and caching, along with pipelining and localizing. Furthermore, to adapt the topologies to application characteristics and cluster environments, a model analysis based RTMR cluster constructing method is proposed. The benchmark built on the urban vehicle monitoring system shows RTMR can provide the real-time capability and scalability for data stream processing over large scale data.