Scalable similarity-based neighborhood methods with MapReduce

Authors:
Sebastian Schelter;Christoph Boden;Volker Markl
Affiliations:
Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany
Venue:
Proceedings of the sixth ACM conference on Recommender systems
Year:
2012

Citing 24
Cited 2

GroupLens: an open architecture for collaborative filtering of netnews

CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
Item-based collaborative filtering recommendation algorithms

Proceedings of the 10th international conference on World Wide Web
Amazon.com Recommendations: Item-to-Item Collaborative Filtering

IEEE Internet Computing
An Overview of The System Software of A Parallel Relational Database Machine GRACE

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
TiVo: making show recommendations using a distributed collaborative filtering architecture

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating similarity measures: a large-scale study in the orkut social network

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Lessons from the Netflix prize challenge

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
TrustWalker: a random walk model for combining trust-based and item-based recommendation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Factor in the neighbors: Scalable and accurate collaborative filtering

ACM Transactions on Knowledge Discovery from Data (TKDD)
The YouTube video recommendation system

Proceedings of the fourth ACM conference on Recommender systems
Recommender Systems Handbook

Recommender Systems Handbook
Large-scale matrix factorization with distributed stochastic gradient descent

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Product recommendation and rating prediction based on multi-modal social networks

Proceedings of the fifth ACM conference on Recommender systems
Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit

Proceedings of the fifth ACM conference on Recommender systems
MyMediaLite: a free recommender system library

Proceedings of the fifth ACM conference on Recommender systems
Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop

SERVICES '11 Proceedings of the 2011 IEEE World Congress on Services
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
Spinning fast iterative data flows

Proceedings of the VLDB Endowment

Distributed matrix factorization with mapreduce using a series of broadcast-joins

Proceedings of the 7th ACM conference on Recommender systems
MapReduce performance evaluation for knowledge-based recommendation of context-tagged photos

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity-based neighborhood methods, a simple and popular approach to collaborative filtering, infer their predictions by finding users with similar taste or items that have been similarly rated. If the number of users grows to millions, the standard approach of sequentially examining each item and looking at all interacting users does not scale. To solve this problem, we develop a MapReduce algorithm for the pairwise item comparison and top-N recommendation problem that scales linearly with respect to a growing number of users. This parallel algorithm is able to work on partitioned data and is general in that it supports a wide range of similarity measures. We evaluate our algorithm on a large dataset consisting of 700 million song ratings from Yahoo! Music.