Concept labeling: building text classifiers with minimal supervision

Authors:
Vijil Chenthamarakshan;Prem Melville;Vikas Sindhwani;Richard D. Lawrence
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Year:
2011

Citing 12
Cited 2

Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Boosting Inductive Transfer for Text Classification Using Wikipedia

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Wikipedia in Action: Ontological Knowledge in Text Categorization

ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Text categorization with knowledge transfer from heterogeneous data sources

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Discriminative Learning Under Covariate Shift

The Journal of Machine Learning Research
Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-Supervised Learning

Semi-Supervised Learning
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Fast learning for sentiment analysis on bullying

Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining
Novel document detection for massive data streams using distributed dictionary learning

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.