Social content matching in MapReduce

Authors:
Gianmarco De Francisci Morales;Aristides Gionis;Mauro Sozio
Affiliations:
IMT Lucca and ISTI-CNR Pisa, Italy;Yahoo! Research, Barcelona, Spain;Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 20
Cited 5

Approximating matchings in parallel

Information Processing Letters
A simple randomized parallel algorithm for maximal f-matchings

Information Processing Letters
Beyond the flow decomposition barrier

Journal of the ACM (JACM)
Constrained multi-object auctions and b-matching

Information Processing Letters
Approximation algorithms

Approximation algorithms
An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Improved distributed approximate matching

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Graph construction and b-matching for semi-supervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Graph Twiddling in a MapReduce World

Computing in Science and Engineering
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Max-cover in map-reduce

Proceedings of the 19th international conference on World wide web
Fast algorithms for finding matchings in lopsided bipartite graphs with applications to display ads

Proceedings of the 11th ACM conference on Electronic commerce
Distributed fractional packing and maximum weighted b-matching via tail-recursive duality

DISC'09 Proceedings of the 23rd international conference on Distributed computing
Assigning Papers to Referees

Algorithmica - Special Issue: Matching Under Preferences; Guest Editors: David F. Manlove, Robert W. Irving and Kazuo Iwama
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Document Similarity Self-Join with MapReduce

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
B-Matching for spectral clustering

ECML'06 Proceedings of the 17th European conference on Machine Learning

Densest subgraph in streaming and MapReduce

Proceedings of the VLDB Endowment
Computing n-gram statistics in MapReduce

Proceedings of the 16th International Conference on Extending Database Technology
Minimal MapReduce algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
A distributed algorithm for large-scale generalized matching

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.