Exploring large-data issues in the curriculum: a case study with MapReduce

  • Authors:
  • Jimmy Lin

  • Affiliations:
  • University of Maryland, College Park

  • Venue:
  • TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the design of a pilot research and educational effort at the University of Maryland centered around technologies for tackling Web-scale problems. In the context of a "cloud computing" initiative lead by Google and IBM, students and researchers are provided access to a computer cluster running Hadoop, an open-source Java implementation of Google's MapReduce framework. This technology provides an opportunity for students to explore large-data issues in the context of a course organized around teams of graduate and undergraduate students, in which they tackle open research problems in the human language technologies. This design represents one attempt to bridge traditional instruction with real-world, large-data research challenges.