Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Combining machine learning and hierarchical structures for text categorization
Combining machine learning and hierarchical structures for text categorization
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology)
Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology)
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Learning classifiers using hierarchically structured class taxonomies
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Hi-index | 0.00 |
Thresholding strategies in automated text categorization are an underexplored area of research. Indeed, thresholding strategies are often considered a post-processing step of minor importance, the underlying assumptions being that they do not make a difference in the performance of a classifier and that finding the optimal thresholding strategy for any given classifier is trivial. Neither these assumptions are true. In this paper, we concentrate on progressive filtering, a hierarchical text categorization technique that relies on a local-classifier-per-node approach, thus mimicking the underlying taxonomy of categories. The focus of the paper is on assessing TSA, a greedy threshold selection algorithm, against a relaxed brute-force algorithm and the most relevant state-of-the-art algorithms. Experiments, performed on Reuters, confirm the validity of TSA.