MapDupReducer: detecting near duplicates over massive datasets

  • Authors:
  • Chaokun Wang;Jianmin Wang;Xuemin Lin;Wei Wang;Haixun Wang;Hongsong Li;Wanpeng Tian;Jun Xu;Rui Li

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;University of New South Wales and NICTA, Sydney, Australia;University of New South Wales and NICTA, Sydney, Australia;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Near duplicate detection benefits many applications, e.g., on-line news selection over the Web by keyword search. The purpose of this demo is to show the design and implementation of MapDupReducer, a MapReduce based system capable of detecting near duplicates over massive datasets efficiently.