An interactive algorithm for asking and incorporating feature feedback into support vector machines

Authors:
Hema Raghavan;James Allan
Affiliations:
Yahoo! Inc;University of Massachusetts
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 18
Cited 18

Experiments with query acquisition and use in document retrieval systems

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Computational learning theory: survey and selected bibliography

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
The potential and actual effectiveness of interactive query expansion

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Incorporating Prior Knowledge into Boosting

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Re-examining the potential effectiveness of interactive query expansion

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Active learning: theory and applications

Active learning: theory and applications
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Document classification through interactive supervision of document and term labels

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Text clustering with extended user feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Constructing informative prior distributions from domain knowledge in text classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Active Learning with Feedback on Features and Instances

The Journal of Machine Learning Research
Tandem learning: a learning framework for document categorization

Tandem learning: a learning framework for document categorization

Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Uncertainty sampling and transductive experimental design for active dual supervision

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active dual supervision: reducing the cost of annotating examples and features

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Interactive feature space construction using semantic information

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Modeling annotators: a generative approach to learning from annotator rationales

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Supervised Dual-PLSA for Personalized SMS Filtering

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Active learning by labeling features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Topic-wise, sentiment-wise, or otherwise?: Identifying the hidden dimension for unsupervised text classification

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Towards subjectifying text clustering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Active learning for biomedical citation screening

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A unified approach to active dual supervision for labeling features and examples

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
End-user feature labeling: a locally-weighted regression approach

Proceedings of the 16th international conference on Intelligent user interfaces
Which clustering do you want? inducing your ideal clustering with minimal feedback

Journal of Artificial Intelligence Research
Filtering semi-structured documents based on faceted feedback

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A non-negative matrix factorization based approach for active dual supervision from document and word labels

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Regroup: interactive machine learning for on-demand group creation in social networks

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
End-user interactions with intelligent and autonomous systems

CHI '12 Extended Abstracts on Human Factors in Computing Systems
End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard machine learning techniques typically require ample training data in the form of labeled instances. In many situations it may be too tedious or costly to obtain sufficient labeled data for adequate classifier performance. However, in text classification, humans can easily guess the relevance of features, that is, words that are indicative of a topic, thereby enabling the classifier to focus its feature weights more appropriately in the absence of sufficient labeled data. We will describe an algorithm for tandem learning that begins with a couple of labeled instances, and then at each iteration recommends features and instances for a human to label. Tandem learning using an "oracle" results in much better performance than learning on only features or only instances. We find that humans can emulate the oracle to an extent that results in performance (accuracy) comparable to that of the oracle. Our unique experimental design helps factor out system error from human error, leading to a better understanding of when and why interactive feature selection works.