Data sparsity: a key disadvantage of user-based collaborative filtering?

Authors:
Biyun Hu;Zhoujun Li;Wenhan Chao
Affiliations:
State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China
Venue:
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Year:
2012

Citing 5
Cited 0

An algorithmic framework for performing collaborative filtering

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Item-based collaborative filtering recommendation algorithms

Proceedings of the 10th international conference on World Wide Web
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
The effect of sparsity on collaborative filtering metrics

ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Data sparsity issues in the collaborative filtering framework

WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, data sparsity is seen as a key disadvantage of user-based CF. It is often assumed that data sparsity may cause small number of co-rated items or no such ones between two users, resulting in unreliable or unavailable similarity information, and further incurring poor recommendation quality. However, the analysis process is often not experimentally verified. To make a detailed analysis, the effects of the data sparsity on user-based CF are experimented with three steps. Firstly, the relationships between the data sparsity and the number of co-rated items are investigated. Secondly, the characteristics of the number are explored. Thirdly, the effects of the number on the recommendation quality are evaluated. Experimental results show that: a) as data sparsity increases, the number of co-rated items doesn't drop, and b) recommendation quality doesn't drop as the number of co-rated items decreases. These results show that the traditional analysis about the effects of data sparsity is problematic. We hope that this new conclusion about the effects of data sparsity can provide implications for the design of CF algorithms.