Muppet: MapReduce-style processing of fast data

Authors:
Wang Lam;Lu Liu;Sts Prasad;Anand Rajaraman;Zoheb Vacheri;AnHai Doan
Affiliations:
WalmartLabs;WalmartLabs;WalmartLabs;WalmartLabs;WalmartLabs;University of Wisconsin-Madison
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 11
Cited 8

Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable Parallel Programming with CUDA

Queue - GPU Computing
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
A platform for scalable one-pass analytics using MapReduce

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Nova: continuous Pig/Hadoop workflows

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incoop: MapReduce for incremental computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing

Execution and optimization of continuous queries with cyclops

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast data in the era of big data: Twitter's real-time related query suggestion architecture

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Data stream warehousing

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Grand challenge: MapReduce-style processing of fast sensor data

Proceedings of the 7th ACM international conference on Distributed event-based systems
Demo: elastic mapreduce-style processing of fast data

Proceedings of the 7th ACM international conference on Distributed event-based systems
Cache conscious star-join in MapReduce environments

Proceedings of the 2nd International Workshop on Cloud Intelligence
Memory-efficient groupby-aggregate using compressed buffer trees

Proceedings of the 4th annual Symposium on Cloud Computing
Nephele streaming: stream processing under QoS constraints at scale

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applications and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this question, as carried out at Kosmix and WalmartLabs. We describe MapUpdate, a framework like MapReduce, but specifically developed for fast data. We describe Muppet, our implementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used extensively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce.