Data sparsity: a key disadvantage of user-based collaborative filtering?

  • Authors:
  • Biyun Hu;Zhoujun Li;Wenhan Chao

  • Affiliations:
  • State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China

  • Venue:
  • APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditionally, data sparsity is seen as a key disadvantage of user-based CF. It is often assumed that data sparsity may cause small number of co-rated items or no such ones between two users, resulting in unreliable or unavailable similarity information, and further incurring poor recommendation quality. However, the analysis process is often not experimentally verified. To make a detailed analysis, the effects of the data sparsity on user-based CF are experimented with three steps. Firstly, the relationships between the data sparsity and the number of co-rated items are investigated. Secondly, the characteristics of the number are explored. Thirdly, the effects of the number on the recommendation quality are evaluated. Experimental results show that: a) as data sparsity increases, the number of co-rated items doesn't drop, and b) recommendation quality doesn't drop as the number of co-rated items decreases. These results show that the traditional analysis about the effects of data sparsity is problematic. We hope that this new conclusion about the effects of data sparsity can provide implications for the design of CF algorithms.