An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
Classification in Networked Data: A Toolkit and a Univariate Case Study
The Journal of Machine Learning Research
Combining Collective Classification and Link Prediction
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
The issue of the automatic classification of research articles into one or more fields of science is of primary importance for scientific databases and digital libraries. A sophisticated classification strategy renders searching more effective and assists the users in locating similar relevant items. Although the most publishing services require from the authors to categorize their articles themselves, there are still cases where older documents remain unclassified, or the taxonomy changes over time. In this work we attempt to address this interesting problem by introducing a machine learning algorithm which combines several parameters and meta-data of a research article. In particular, our model exploits the training set to correlate keywords, authors, co-authorship, and publishing journals to a number of labels of the taxonomy. In the sequel, it applies this information to classify the rest of the documents. The experiments we have conducted with a large dataset comprised of about 1,5 million articles, demonstrate that in this specific application, our model outperforms the AdaBoost.MH and SVM methods.