The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Cluster computing for web-scale data processing
Proceedings of the 39th SIGCSE technical symposium on Computer science education
Hi-index | 0.00 |
We have setup a new course on the large scale data processing using clusters. It introduces the concepts and design of distributed systems. Many newly developed ideas such as Google file system and MapReduce programming framework for processing large scale data sets are introduced. Students will gain practical experience with distributed programming technologies via several small labs and one large multi-week final project. Labs and projects will be completed using Hadoop, an open-source implementation of Google's distributed file system and MapReduce programming model. We have taught this class named "Mass Data Processing Technology on Large Scale Clusters" for two years. This paper will describe the design, perform of the course as well as the experiences and lessons learned.