Web data processing on the cloud

Authors:
Shahan Khatchadourian;Mariano Consens;Jerome Simeon
Affiliations:
University of Toronto;University of Toronto;IBM Corp.
Venue:
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Year:
2010

Citing 5
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Ricardo: integrating R and Hadoop

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cloud computing is emerging as a highly scalable, fault-tolerant, and cost-effective way to process large amounts of information on the Web. Thanks in part to new data processing paradigms designed with the Cloud in mind (such as MapReduce[1], HDFS[2], Cassandra[3], etc), it is quickly gaining acceptance as a viable platform for organizations that need to store, process, and publish large amounts of data. MapReduce is attractive for processing data on the Cloud, to a large extent, because of its simplicity and flexibility. Implementations of MapReduce usually include a simple API used to describe which part of the processing is done in parallel (Map phase), and which part of the processing is done after grouping data on a single machine (Reduce phase). It does not rely on a pre-existing data model, making it possible to process any kind of information independently of the model. Cloud applications have notably been used to perform off-line analytical processing such as analyzing web request logs, computing user recommendations, and understanding scenes in images [4].