Item-based collaborative filtering recommendation algorithms
Proceedings of the 10th international conference on World Wide Web
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition
SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Making the most of your data: KDD Cup 2007 "How Many Ratings" winner's report
ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Spectral clustering in telephone call graphs
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Training and testing of recommender systems on data missing not at random
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
KDD Cup 2007 focuses on predicting aspects of movie rating behavior. We present our prediction method for Task 1 "Who Rated What in 2006" where the goal is to predict which users rated which movies in 2006. We use the combination of the following methods, listed in the order of their accuracy: • The predicted number of ratings for each movie based on time series analysis, also using movie and DVD release dates and movie series detection by the edit distance of the titles. • The predicted number of ratings by each user by using the fact that ratings were sampled proportional to the margin. • The low rank approximation of the 0-1 matrix of known user-movie pairs with rating. • Prediction by using the movie-movie similarity matrix. • Association rules obtained by frequent sequence mining of user ratings considered as ordered itemsets. By combining the predictions by linear regression we obtained a prediction with root mean squared error 0.256. The first runner up result was 0.263 while a pure all zeroes prediction already gives 0.279, indicating the hardness of the task.