Ad-hoc data processing in the cloud

Authors:
Dionysios Logothetis;Kenneth Yocum
Affiliations:
UCSD;UCSD
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 9
Cited 8

Parallel database systems: the future of high performance database systems

Communications of the ACM
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
Scalability and accuracy in a large-scale network emulator

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Fault-tolerance in the Borealis distributed stream processing system

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Incremental maintenance for non-distributive aggregate functions

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Wide-scale data stream management

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference

MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Dynamic Query Processing for P2P Data Services in the Cloud

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Flood: elastic streaming MapReduce

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
DryadInc: reusing work in large-scale computations

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Parallel data processing with MapReduce: a survey

ACM SIGMOD Record
A fully-protected large-scale email system built on map-reduce framework

GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ad-hoc data processing has proven to be a critical paradigm for Internet companies processing large volumes of unstructured data. However, the emergence of cloud-based computing, where storage and CPU are outsourced to multiple third-parties across the globe, implies large collections of highly distributed and continuously evolving data. Our demonstration combines the power and simplicity of the MapReduce abstraction with a wide-scale distributed stream processor, Mortar. While our incremental MapReduce operators avoid data re-processing, the stream processor manages the placement and physical data flow of the operators across the wide area. We demonstrate a distributed web indexing engine against which users can submit and deploy continuous MapReduce jobs. A visualization component illustrates both the incremental indexing and index searches in real time.