On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
First story detection in TDT is hard
Proceedings of the ninth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Topic-conditioned novelty detection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Newsjunkie: providing personalized newsfeeds via analysis of information novelty
Proceedings of the 13th international conference on World Wide Web
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Document classification through interactive supervision of document and term labels
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A model for handling approximate, noisy or incomplete labeling in text classification
ICML '05 Proceedings of the 22nd international conference on Machine learning
Hi-index | 0.00 |
We introduce the evolving label-set problem encountered in building real-world text classification systems. This problem arises when a text classification system trained on a label-set encounters documents of unseen classes at deployment time. We design a Class-Detector module that monitors unlabeled data, detects new classes, and suggests them to the administrator for inclusion in the label-set. We propose abstractions that group together tokens under human understandable concepts and provide a mechanism of assigning importance to unseen terms. We present generative algorithms leveraging the notion of support of documents in a model for (1) selecting documents of proposed new classes, and (2) automatically triggering detection of new classes. Experiments on three real world taxonomies show that our methods select new class documents with high precision, and trigger emergence of new classes with low false-positive and false-negative rates.