End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression

Authors:
Shubhomoy Das;Travis Moore;Weng-Keen Wong;Simone Stumpf;Ian Oberst;Kevin Mcintosh;Margaret Burnett
Affiliations:
Oregon State University, OR, USA;Oregon State University, OR, USA;Oregon State University, OR, USA;City University London, UK;Oregon State University, OR, USA;Oregon State University, OR, USA;Oregon State University, OR, USA
Venue:
Artificial Intelligence
Year:
2013

Citing 26
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Omega: on-line memory-based general purpose system classifier

Omega: on-line memory-based general purpose system classifier
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ConceptNet — A Practical Commonsense Reasoning Tool-Kit

BT Technology Journal
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Active Learning with Feedback on Features and Instances

The Journal of Machine Learning Research
An interactive algorithm for asking and incorporating feature feedback into support vector machines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Fixing the program my computer learned: barriers for end users, challenges for the machine

Proceedings of the 14th international conference on Intelligent user interfaces
Learning from measurements in exponential families

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Uncertainty sampling and transductive experimental design for active dual supervision

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Interacting meaningfully with machine learning systems: Three experiments

International Journal of Human-Computer Studies
Sentiment analysis of blogs by combining lexical knowledge with text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive feature space construction using semantic information

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
AnalogySpace: reducing the dimensionality of common sense knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Active learning with statistical models

Journal of Artificial Intelligence Research
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
A unified approach to active dual supervision for labeling features and examples

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Explanatory Debugging: Supporting End-User Debugging of Machine-Learned Programs

VLHCC '10 Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing
End-user feature labeling: a locally-weighted regression approach

Proceedings of the 16th international conference on Intelligent user interfaces
Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions-especially in early stages when training data is limited. The end user can improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally-weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real-world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning. Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications.