C4.5: programs for machine learning
C4.5: programs for machine learning
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Statistical Behavior and Consistency of Support Vector Machines, Boosting, and Beyond
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A decision-tree-based symbolic rule induction system for text categorization
IBM Systems Journal
Focused named entity recognition using machine learning
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A robust risk minimization based named entity recognition system
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
International Journal of Approximate Reasoning
International Journal of Open Source Software and Processes
Hi-index | 0.00 |
We present results for automated text categorization of the Reuters-810000 collection of news stories. Our experiments use the entire one-year collection of 810,000 stories and the entire subject index. We divide the data into monthly groups and provide an initial benchmark of text categorization performance on the complete collection. Experimental results show that efficient sparse-feature implementations of linear methods and decision trees, using a global unstemmed dictionary, can readily handle applications of this size. Predictive performance is approximately as strong as the best results for the much smaller older Reuters collections. Detailed results are provided over time periods. It is shown that a smaller time horizon does not appreciably diminish predictive quality, implying reduced demands for retraining when sample size is large.