Item-based collaborative filtering recommendation algorithms
Proceedings of the 10th international conference on World Wide Web
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Knowledge and Information Systems
Unifying user-based and item-based collaborative filtering approaches by similarity fusion
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Nearest neighbors in high-dimensional data: the emergence and influence of hubs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Data sparsity issues in the collaborative filtering framework
WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Hi-index | 0.01 |
A crucial operation in memory-based collaborative filtering (CF) is determining nearest neighbors (NNs) of users/items. This paper addresses two phenomena that emerge when CF algorithms perform NN search in high-dimensional spaces that are typical in CF applications. The first is similarity concentration and the second is the appearance of hubs (i.e. points which appear in $k$-NN lists of many other points). Through theoretical analysis and experimental evaluation we show that these phenomena are inherent properties of high-dimensional space, unrelated to other data properties like sparsity, and that they can impact CF algorithms by questioning the meaning and representativeness of discovered NNs. Moreover, we show that it is not easy to mitigate the phenomena using dimensionality reduction. Studying these phenomena aims to provide a better understanding of the limitations of memory-based CF and motivate the development of new algorithms that would overcome them.