Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Learning routing queries in a query zone
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Text databases & document management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A new family of online algorithms for category ranking
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
General Convergence Results for Linear Discriminant Updates
Machine Learning
Uncertainty-Based Noise Reduction and Term Selection in Text Categorization
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
ECIR'03 Proceedings of the 25th European conference on IR research
A study on optimal parameter tuning for Rocchio text classifier
ECIR'03 Proceedings of the 25th European conference on IR research
CyberIR --- A Technological Approach to Fight Cybercrime
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Phrase-based document categorization revisited
Proceedings of the 2nd international workshop on Patent information retrieval
PSI'09 Proceedings of the 7th international Andrei Ershov Memorial conference on Perspectives of Systems Informatics
Hi-index | 0.00 |
Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In this paper we investigate the effect of parameter tuning on the accuracy of two Text Categorization algorithms: the well-known Rocchio algorithm and the lesser-known Winnow. We show that the optimal parameter values for a specific corpus are sometimes very different from those found in literature. We show that the effect of individual parameters is corpus-dependent, and that parameter tuning can greatly improve the accuracy of both Winnow and Rocchio. We argue that the dependence of the categorization algorithms on experimentally established parameter values makes it hard to compare the outcomes of different experiments and propose the automatic determination of optimal parameters on the train set as a solution.