An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Learning hierarchical multi-category text classification models
ICML '05 Proceedings of the 22nd international conference on Machine learning
Boosting multi-label hierarchical text categorization
Information Retrieval
Incremental E-Mail Classification and Rule Suggestion Using Simple Term Statistics
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
A perceptron-like linear supervised algorithm for text classification
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Exploiting concept clumping for efficient incremental e-mail categorization
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Hi-index | 0.00 |
In this paper, we introduce efficient methods for incremental multi-label categorization of documents. We use concept clumping to efficiently categorize news articles into a hierarchical structure of categories. Concept clumping is a phenomenon of local coherences occurring in the data and it has been previously used for fast, incremental e-mail classification. We extend the definition of clumping and introduce additional clumping metrics specifically for multi-label document categorization. We present three methods for incremental multi-label categorization that exploit concept clumping and make use of thresholding techniques and a new term-category weight boosting method. Our methods are tested using the Reuters (RCV1) news corpus and the accuracy obtained is comparable to some well known machine learning methods trained in batch mode, but with much lower computation time.