Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Four text classification algorithms compared on a Dutch corpus
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
General Convergence Results for Linear Discriminant Updates
Machine Learning
Uncertainty-Based Noise Reduction and Term Selection in Text Categorization
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Term Frequency Normalization via Pareto Distributions
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hi-index | 0.00 |
This paper discusses the notion of Uncertainty, which has a prominent place in the theory and experimental practice of modern Physics. It argues that the awareness of Uncertainty may also be of tremendous importance to the field of Information Retrieval, and in particular Text Categorization.As an application of Uncertainty in Text Categorization, a new criterion for Term Selection is described, which is based on the Uncertainty in Term Frequency across categories. This criterion allows to distinguish between low-quality (or "noisy") and high-quality ("stiff") terms.We describe an experiment investigating the effect of eliminating noisy and stiff terms in the context of text classification. In the experiment we applied the Rocchio and Winnow classification algorithms to a collection of newspaper items, a mono-classified subset of the well-known Reuters 21578 corpus.This investigation shows that both the local elimination of noisy terms and the global elimination of stiff terms can be used for Term Selection in Text Categorization.