A classical predictive modeling approach for task "Who rated what?" of the KDD CUP 2007

  • Authors:
  • Jorge Sueiras;Alfonso Salafranca;Jose Luis Florez

  • Affiliations:
  • Neo Metrics, Madrid, Spain;Neo Metrics, Madrid, Spain;Neo Metrics, Madrid, Spain

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes one possible way to solve task "Who rated what?" of the KDD CUP 2007. The proposed solution is a history-based model that predicts whether a user will vote a given movie. Key points to our approach are (1) the estimation of the model baseline, (2) the definition of the explanatory variables and (3) the mathematical model form. Given the binary outcome of the problem, the estimation of the true baseline (ratio of 1's in the test data) is critical in order to correctly make predictions. In parallel, to improve the model predictive power, we have developed a careful construction of the input variables. These explanatory variables can be grouped as: user voting behaviour variables, the movie characteristics and user-movie interactions. Finally, the mathematical model form (linear logistic regression) has been chosen among various model form competitors.