Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013

Authors:
Chun-Liang Li;Yu-Chuan Su;Ting-Wei Lin;Cheng-Hao Tsai;Wei-Cheng Chang;Kuan-Hao Huang;Tzu-Ming Kuo;Shan-Wei Lin;Young-San Lin;Yu-Chen Lu;Chun-Pai Yang;Cheng-Xia Chang;Wei-Sheng Chin;Yu-Chin Juan;Hsiao-Yu Tung;Jui-Pin Wang;Cheng-Kuang Wei;Felix Wu;Tu-Chun Yin;Tong Yu;Yong Zhuang;Shou-de Lin;Hsuan-Tien Lin;Chih-Jen Lin
Affiliations:
National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University
Venue:
Proceedings of the 2013 KDD Cup 2013 Workshop
Year:
2013

Citing 7
Cited 0

Random Forests

Machine Learning
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Natural Language Processing with Python

Natural Language Processing with Python
Bagging gradient-boosted trees for high precision, low variance ranking models

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Co-author Relationship Prediction in Heterogeneous Bibliographic Networks

ASONAM '11 Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining
Scikit-learn: Machine Learning in Python

The Journal of Machine Learning Research
The Microsoft academic search dataset and KDD Cup 2013

Proceedings of the 2013 KDD Cup 2013 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

The track 1 problem in KDD Cup 2013 is to discriminate between papers confirmed by the given authors from the other deleted papers. This paper describes the winning solution of team National Taiwan University for track 1 of KDD Cup 2013. First, we conduct the feature engineering to transform the various provided text information into 97 features. Second, we train classification and ranking models using these features. Last, we combine our individual models to boost the performance by using results on the internal validation set and the official Valid set. Some effective post-processing techniques have also been proposed. Our solution achieves 0.98259 MAP score and ranks the first place on the private leaderboard of Test set.