Data-intensive text processing with MapReduce

  • Authors:
  • Jimmy Lin;Chris Dyer

  • Affiliations:
  • University of Maryland, College Park;University of Maryland, College Park

  • Venue:
  • NAACL-Tutorials '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model [1], using the open-source Hadoop implementation. The focus will be on scalability and the tradeoffs associated with distributed processing of large datasets. Content will include general discussions about algorithm design, presentation of illustrative algorithms, case studies in HLT applications, as well as practical advice in writing Hadoop programs and running Hadoop clusters.