GroupLens: an open architecture for collaborative filtering of netnews
CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
A small approximately min-wise independent family of hash functions
Journal of Algorithms
Item-based collaborative filtering recommendation algorithms
Proceedings of the 10th international conference on World Wide Web
Estimating Rarity and Similarity over Data Stream Windows
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
Less hashing, same performance: building a better bloom filter
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Range-Efficient Counting of Distinct Elements in a Massive Data Stream
SIAM Journal on Computing
Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Sketching techniques for collaborative filtering
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A survey of collaborative filtering techniques
Advances in Artificial Intelligence
Proceedings of the 19th international conference on World wide web
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the k-independence required by linear probing and minwise independence
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Fingerprinting ratings for collaborative filtering: theoretical and empirical analysis
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast locality-sensitive hashing
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Exponential time improvement for min-wise based algorithms
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Hi-index | 0.00 |
A key building block for collaborative filtering recommender systems is finding users with similar consumption patterns. Given access to the full data regarding the items consumed by each user, one can directly compute the similarity between any two users. However, for massive recommender systems such a naive approach requires a high running time and may be intractable in terms of the space required to store the full data. One way to overcome this is using sketching, a technique that represents massive datasets concisely, while still allowing calculating properties of these datasets. Sketching methods maintain very short fingerprints of the item sets of users, which allow approximately computing the similarity between sets of different users. The state of the art sketch [22] has a very low space complexity, and a recent technique [14] shows how to exponentially speed up the computation time involved in building the fingerprints. Unfortunately, these methods are incompatible, forcing a choice between low running time or a small sketch size. We propose an alternative sketching approach, which achieves both a low space complexity similar to that of [22] and a low time complexity similar to [14]. We empirically evaluate our algorithm using the Netflix dataset. We analyze the running time and the sketch size of our approach and compare them to alternatives. Further, we show that in practice the accuracy achieved by our approach is even better than the accuracy guaranteed by the theoretical bounds, so it suffices to use even shorter fingerprints to obtain high quality results.