Feature engineering and tree modeling for author-paper identification challenge

  • Authors:
  • Jiefei Li;Xiaocong Liang;Weijie Ding;Weidong Yang;Rong Pan

  • Affiliations:
  • Sun Yat-Sen University;Sun Yat-Sen University;Sun Yat-Sen University;Sun Yat-Sen University;Sun Yat-Sen University

  • Venue:
  • Proceedings of the 2013 KDD Cup 2013 Workshop
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ability to search literature and collect/aggregate metrics around publications is a central tool for modern research. Both academic and industry researchers across hundreds of scientific disciplines, from astronomy to zoology, increasingly rely on search to understand what has been published and by whom. Microsoft Academic Search is an open platform, which provides a variety of metrics and experiences for the research community, in addition to literature search. As the covering data came from many sources, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to others. KDD Cup 2013 Track 1 challenges participants to determine which papers in an author profile were truly written by the given author. In this work, we present how to use tree-base models to accurately predict the paper author. We incorporate feature engineering into the models with the advantages of them. This paper introduces two kinds of tree-base models (GB-DT [4], RGF [5]) and presents in detail the learning algorithm and how features can be generated for the task. The experimental results show the effectiveness of the proposed approach.