Teaching large scale data processing: the five-week course and two years' experiences

  • Authors:
  • Kang Chen;Yubing Yin;Weimin Zheng

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • SCE '08 Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have setup a new course on the large scale data processing using clusters. It introduces the concepts and design of distributed systems. Many newly developed ideas such as Google file system and MapReduce programming framework for processing large scale data sets are introduced. Students will gain practical experience with distributed programming technologies via several small labs and one large multi-week final project. Labs and projects will be completed using Hadoop, an open-source implementation of Google's distributed file system and MapReduce programming model. We have taught this class named "Mass Data Processing Technology on Large Scale Clusters" for two years. This paper will describe the design, perform of the course as well as the experiences and lessons learned.