Encapsulation of parallelism in the Volcano query processing system
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Online Random Shuffling of Large Database Tables
IEEE Transactions on Knowledge and Data Engineering
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Ad-hoc data processing in the cloud
Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Detecting large-scale system problems by mining console logs
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Distributed online aggregations
Proceedings of the VLDB Endowment
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Online aggregation and continuous query support in MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Web data processing on the cloud
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
A hadoop-based packet trace processing tool
TMA'11 Proceedings of the Third international conference on Traffic monitoring and analysis
A latency and fault-tolerance optimizer for online parallel query plans
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Exploring MapReduce efficiency with highly-distributed data
Proceedings of the second international workshop on MapReduce and its applications
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
G2: a graph processing system for diagnosing distributed systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Fault injection-based assessment of partial fault tolerance in stream processing applications
Proceedings of the 5th ACM international conference on Distributed event-based system
Proceedings of the Ninth International Workshop on Dynamic Analysis
Energy proportionality and performance in data parallel computing clusters
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Mining large distributed log data in near real time
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Elastic phoenix: malleable mapreduce for shared-memory systems
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Hadoop acceleration through network levitated merge
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
MadLINQ: large-scale distributed matrix computation for the cloud
Proceedings of the 7th ACM european conference on Computer Systems
Cutting MapReduce cost with spot market
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
TransMR: data-centric programming beyond data parallelism
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
In-situ MapReduce for log processing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
The HaLoop approach to large-scale iterative data analysis
The VLDB Journal — The International Journal on Very Large Data Bases
iMapReduce: A Distributed Computing Framework for Iterative Computation
Journal of Grid Computing
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Camdoop: exploiting in-network aggregation for big data applications
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
P2P-MapReduce: Parallel data processing in dynamic Cloud environments
Journal of Computer and System Sciences
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
C-MR: continuously executing MapReduce workflows on multi-core processors
Proceedings of third international workshop on MapReduce and its Applications Date
Understanding the effects and implications of compute node related failures in hadoop
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Massively-parallel stream processing under QoS constraints with Nephele
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Maestro: Replica-Aware Map Scheduling for MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MapReduce Workload Modeling with Statistical Approach
Journal of Grid Computing
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Efficient multi-way theta-join processing using MapReduce
Proceedings of the VLDB Endowment
REX: recursive, delta-based data-centric computation
Proceedings of the VLDB Endowment
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
Muppet: MapReduce-style processing of fast data
Proceedings of the VLDB Endowment
SkewTune in action: mitigating skew in MapReduce applications
Proceedings of the VLDB Endowment
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
AROMA: automated resource allocation and configuration of mapreduce environment in the cloud
Proceedings of the 9th international conference on Autonomic computing
Hierarchical merge for scalable MapReduce
Proceedings of the 2012 workshop on Management of big data systems
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Multimedia Applications and Security in MapReduce: Opportunities and Challenges
Concurrency and Computation: Practice & Experience
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Designing good algorithms for MapReduce and beyond
Proceedings of the Third ACM Symposium on Cloud Computing
On-the-fly task execution for speeding up pipelined mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Future Generation Computer Systems
Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
MapReduce-Based data stream processing over large history data
ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Scalable parallel computing on clouds using Twister4Azure iterative MapReduce
Future Generation Computer Systems
VScope: middleware for troubleshooting time-sensitive data center applications
Proceedings of the 13th International Middleware Conference
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Incremental stream processing using computational conflict-free replicated data types
Proceedings of the 3rd International Workshop on Cloud Data and Platforms
Stat!: an interactive analytics environment for big data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Execution and optimization of continuous queries with cyclops
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
BlinkDB: queries with bounded errors and bounded response times on very large data
Proceedings of the 8th ACM European Conference on Computer Systems
Modeling performance of a parallel streaming engine: bridging theory and costs
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Stream-monitoring with blockmon: convergence of network measurements and data analytics platforms
ACM SIGCOMM Computer Communication Review
MapReduce with communication overlap (MaRCO)
Journal of Parallel and Distributed Computing
Adaptive online scheduling in storm
Proceedings of the 7th ACM international conference on Distributed event-based systems
Grand challenge: MapReduce-style processing of fast sensor data
Proceedings of the 7th ACM international conference on Distributed event-based systems
Demo: elastic mapreduce-style processing of fast data
Proceedings of the 7th ACM international conference on Distributed event-based systems
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
CooMR: cross-task coordination for efficient data management in MapReduce programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable Data Processing for Community Sensing Applications
Mobile Networks and Applications
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Proceedings of the 4th annual Symposium on Cloud Computing
Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Does RDMA-based enhanced Hadoop MapReduce need a new performance model?
Proceedings of the 4th annual Symposium on Cloud Computing
Sampling estimators for parallel online aggregation
BNCOD'13 Proceedings of the 29th British National conference on Big Data
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Piranha: optimizing short jobs in Hadoop
Proceedings of the VLDB Endowment
Active data: a data-centric approach to data life-cycle management
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
Journal of Parallel and Distributed Computing
Parallel skyline queries over uncertain data streams in cloud computing environments
International Journal of Web and Grid Services
Nephele streaming: stream processing under QoS constraints at scale
Cluster Computing
IBM streams processing language: analyzing big data in motion
IBM Journal of Research and Development
Libra: divide and conquer to verify forwarding tables in huge networks
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
GRASS: trimming stragglers in approximation analytics
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop and can run unmodified user-defined MapReduce programs.