KDD Cup 2007 task 1 winner report

Authors:
Mikló/s Kurucz;Andrá/s A. Benczú/r;Tamá/s Kiss;Istvá/n Nagy;Adrienn Szabó/;Balá/zs Torma
Affiliations:
Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences
Venue:
ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Year:
2007

Citing 7
Cited 1

Item-based collaborative filtering recommendation algorithms

Proceedings of the 10th international conference on World Wide Web
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition

SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Making the most of your data: KDD Cup 2007 "How Many Ratings" winner's report

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Spectral clustering in telephone call graphs

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis

Training and testing of recommender systems on data missing not at random

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

KDD Cup 2007 focuses on predicting aspects of movie rating behavior. We present our prediction method for Task 1 "Who Rated What in 2006" where the goal is to predict which users rated which movies in 2006. We use the combination of the following methods, listed in the order of their accuracy: • The predicted number of ratings for each movie based on time series analysis, also using movie and DVD release dates and movie series detection by the edit distance of the titles. • The predicted number of ratings by each user by using the fact that ratings were sampled proportional to the margin. • The low rank approximation of the 0-1 matrix of known user-movie pairs with rating. • Prediction by using the movie-movie similarity matrix. • Association rules obtained by frequent sequence mining of user ratings considered as ordered itemsets. By combining the predictions by linear regression we obtained a prediction with root mean squared error 0.256. The first runner up result was 0.263 while a pure all zeroes prediction already gives 0.279, indicating the hardness of the task.