A probabilistic definition of item similarity

  • Authors:
  • Oliver Jojic;Manu Shukla;Niranjan Bhosarekar

  • Affiliations:
  • Comcast Corporation, Washington, D.C., DC, USA;Comcast Corporation, Washington, D.C., DC, USA;Comcast Corporation, Washington, D.C., DC, USA

  • Venue:
  • Proceedings of the fifth ACM conference on Recommender systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In item-based collaborative filtering, a critical intermediate step to personalized recommendations is the definition of an item-similarity metric. Existing algorithms compute the item-similarity using the user-to-item ratings (cosine, Pearson, Jaccard, etc.). When computing the similarity between two items A and B many of these algorithms divide the actual number of co-occurring users by some "difficulty" of co-occurrence. We refine this approach by defining item similarity as the ratio of the actual number of co-occurrences to the number of co-occurrences that would be expected if user choices were random. In the final step of our method to compute personalized recommendations we apply the usage history of a user to the item similarity matrix. The well defined probabilistic meaning of our similarities allows us to further improve this final step. We measured the quality of our algorithm on a large real-world data-set. As part of Comcast's efforts to improve its personalized recommendations of movies and TV shows, several top recommender companies were invited to apply their algorithms to one year of Video-on-Demand usage data. Our algorithm tied for first place. This paper includes a MapReduce pseudo code implementation of our algorithm.