Document classification through interactive supervision of document and term labels

Authors:
Shantanu Godbole;Abhay Harpale;Sunita Sarawagi;Soumen Chakrabarti
Affiliations:
IIT Bombay, Powai, Mumbai, 400076, India;IIT Bombay, Powai, Mumbai, 400076, India;IIT Bombay, Powai, Mumbai, 400076, India;IIT Bombay, Powai, Mumbai, 400076, India
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 23

Text Classification with Evolving Label-Sets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Text clustering with extended user feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Active Learning with Feedback on Features and Instances

The Journal of Machine Learning Research
An interactive algorithm for asking and incorporating feature feedback into support vector machines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An integrated system for building enterprise taxonomies

Information Retrieval
Interactive high-quality text classification

Information Processing and Management: an International Journal
Personalized active learning for collaborative filtering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Text classification, business intelligence, and interactivity: automating C-Sat analysis for services industry

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
Uncertainty sampling and transductive experimental design for active dual supervision

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active dual supervision: reducing the cost of annotating examples and features

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Improving text categorization bootstrapping via unsupervised learning

ACM Transactions on Speech and Language Processing (TSLP)
InterActive feature selection

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Labeling design documents based on operators' consensus-A case study of robotic design

Computers in Industry
CiteData: a new multi-faceted dataset for evaluating personalized search performance

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A unified approach to active dual supervision for labeling features and examples

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Large-scale hierarchical text classification without labelled data

Proceedings of the fourth ACM international conference on Web search and data mining
A non-negative matrix factorization based approach for active dual supervision from document and word labels

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A utility-theoretic ranking method for semi-automated text classification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Live and learn from mistakes: A lightweight system for document classification

Information Processing and Management: an International Journal
On Knowledge-Enhanced Document Clustering

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective incorporation of human expertise, while exerting a low cognitive load, is a critical aspect of real-life text classification applications that is not adequately addressed by batch-supervised high-accuracy learners. Standard text classifiers are supervised in only one way: assigning labels to whole documents. They are thus deprived of the enormous wisdom that humans carry about the significance of words and phrases in context. We present HIClass, an interactive and exploratory labeling package that actively collects user opinion on feature representations and choices, as well as whole-document labels, while minimizing redundancy in the input sought. Preliminary experience suggests that, starting with essentially an unlabeled corpus, very little cognitive labor suffices to set up a labeled collection on which standard classifiers perform well.