Contextual rule-based feature engineering for author-paper identification

  • Authors:
  • Erheng Zhong;Lianghao Li;Naiyan Wang;Ben Tan;Yin Zhu;Lili Zhao;Qiang Yang

  • Affiliations:
  • Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong and Huawei Noah's Ark Lab, Hong Kong

  • Venue:
  • Proceedings of the 2013 KDD Cup 2013 Workshop
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the ideas and methodologies that we used to address the KDD Cup 2013 challenge on author-paper identification. We firstly formulate the problem as a personalized ranking task and then propose to solve the task through a supervised learning framework. The key point is to eliminate those incorrectly assigned papers of a given author based on existing records. We choose Gradient Boosted Tree as our main classifier. Through our exploration we conclude that the most critical factor to achieve our results is the effective feature engineering. In this paper, we formulate this process as a unified framework that constructs features based on contextual information and combines machine learning techniques with human intelligence. Besides this, we suggest several strategies to parse authors' names, which improve the prediction results significantly. Divide-conquer based model building as well as the model averaging techniques also benefit the prediction precision.