Don't match twice: redundancy-free similarity computation with MapReduce

  • Authors:
  • Lars Kolb;Andreas Thor;Erhard Rahm

  • Affiliations:
  • University of Leipzig;University of Leipzig;University of Leipzig

  • Venue:
  • Proceedings of the Second Workshop on Data Analytics in the Cloud
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an approach that eliminates such redundant comparisons and that can be easily integrated into existing MapReduce implementations. We evaluate the approach on a real cloud infrastructure and show its effectiveness for all degrees of redundancy.