Citation prediction using time series approach KDD Cup 2003 (task 1)

  • Authors:
  • J. N. Manjunatha;K. R. Sivaramakrishnan;Raghavendra Kumar Pandey;M Narasimha Murthy

  • Affiliations:
  • Indian Institute of Science, Bangalore, India;Indian Institute of Science, Bangalore, India;Indian Institute of Science, Bangalore, India;Indian Institute of Science, Bangalore, India

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article we describe our experiences in building the winning system for KDD Cup, 2003, Task 1. This year's competition was based on a very large archive of research papers that provides an unusually comprehensive snapshot of a particular social network in action; in addition to the full text of research papers, it includes both explicit citation structure and partial data on the downloading of papers by users. It provides a framework for testing general network and usage mining techniques, which can be explored via four varied and interesting tasks. Each task is a separate competition with its own specific goal. In task 1 the goal is to predict the change in number of citations to each paper in the archive over time.The contest was very challenging because the given data was not in a format suitable for conventional data mining techniques. So we had to do a considerable amount of data processing. Also there were different sources of data like tex files, citation graph, slac-data database. So we had to make a decision about which sources to use and how much to use.