TCS: a shell for content-based text categorization
Proceedings of the sixth conference on Artificial intelligence applications
Machine Learning
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Separate-and-Conquer Rule Learning
Artificial Intelligence Review
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Machine Learning
Machine Learning
smartFIX: A Requirements-Driven System for Document Analysis and Understanding
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Hi-index | 0.00 |
There is an increasing interest in categorizing texts using learning algorithms. While the majority of approaches rely on learning linear classifiers, there is also some interest in describing document categories by text patterns. We introduce a model for learning patterns for text categorization (the LPT-model) that does not rely on an attribute-value representation of documents but represents documents essentially "as they are". Based on the LPT-model, we focus on learning patterns within a relatively simple pattern language. We compare different search heuristics and pruning methods known from various symbolic rule learners on a set of representative text categorization problems. The best results were obtained using the m-estimate as search heuristics combined with the likelihood-ratio-statics for pruning. Even better results can be obtained, when replacing the likelihoodratio-statics by a new measure for pruning; this we call l-measure. In contrast to conventional measures for pruning, the l-measure takes into account properties of the search space.