MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Fast calculation of pairwise mutual information for gene regulatory network reconstruction
Computer Methods and Programs in Biomedicine
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Cache oblivious matrix operations using Peano curves
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
In this paper, we present a parallel method to evaluate functions on pairs of elements. It is a challenge to partition the Cartesian product of a set with itself in order to parallelize the function evaluation on all pairs. Our solution uses (a) replication of set elements to allow for partitioning and (b) aggregation of the results gathered for different copies of an element. Based on an execution model with nodes that execute tasks on local data without online communication, we present a generic algorithm and show how it can be implemented with MapReduce. Three different distribution schemes that define the partitioning of the Cartesian product are introduced, compared, and evaluated. Any one of the distribution schemes can be used to derive and implement a specific algorithm for parallel pairwise element computation.