Parallel database systems: the future of high performance database systems
Communications of the ACM
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Incremental maintenance for non-distributive aggregate functions
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Wide-scale data stream management
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
MapReduce optimization using regulated dynamic prioritization
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Dynamic Query Processing for P2P Data Services in the Cloud
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Flood: elastic streaming MapReduce
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
A fully-protected large-scale email system built on map-reduce framework
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Ad-hoc data processing has proven to be a critical paradigm for Internet companies processing large volumes of unstructured data. However, the emergence of cloud-based computing, where storage and CPU are outsourced to multiple third-parties across the globe, implies large collections of highly distributed and continuously evolving data. Our demonstration combines the power and simplicity of the MapReduce abstraction with a wide-scale distributed stream processor, Mortar. While our incremental MapReduce operators avoid data re-processing, the stream processor manages the placement and physical data flow of the operators across the wide area. We demonstrate a distributed web indexing engine against which users can submit and deploy continuous MapReduce jobs. A visualization component illustrates both the incremental indexing and index searches in real time.