Induction of fuzzy decision trees
Fuzzy Sets and Systems
An information-theoretic perspective of tf—idf measures
Information Processing and Management: an International Journal
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Scikit-learn: Machine Learning in Python
The Journal of Machine Learning Research
Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints
SIAM Journal on Optimization
The Microsoft academic search dataset and KDD Cup 2013
Proceedings of the 2013 KDD Cup 2013 Workshop
Hi-index | 0.00 |
The ability to search literature and collect/aggregate metrics around publications is a central tool for modern research. Both academic and industry researchers across hundreds of scientific disciplines, from astronomy to zoology, increasingly rely on search to understand what has been published and by whom. Microsoft Academic Search is an open platform, which provides a variety of metrics and experiences for the research community, in addition to literature search. As the covering data came from many sources, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to others. KDD Cup 2013 Track 1 challenges participants to determine which papers in an author profile were truly written by the given author. In this work, we present how to use tree-base models to accurately predict the paper author. We incorporate feature engineering into the models with the advantages of them. This paper introduces two kinds of tree-base models (GB-DT [4], RGF [5]) and presents in detail the learning algorithm and how features can be generated for the task. The experimental results show the effectiveness of the proposed approach.