Where are my intelligent assistant's mistakes? a systematic testing approach

Authors:
Todd Kulesza;Margaret Burnett;Simone Stumpf;Weng-Keen Wong;Shubhomoy Das;Alex Groce;Amber Shinsel;Forrest Bice;Kevin McIntosh
Affiliations:
School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;Centre for HCI Design, City University London, Northampton Square, London;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR;School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR
Venue:
IS-EUD'11 Proceedings of the Third international conference on End-user development
Year:
2011

Citing 24
Cited 1

A methodology for testing spreadsheets

ACM Transactions on Software Engineering and Methodology (TOSEM)
Outlier finding: focusing user attention on possible errors

Proceedings of the 14th annual ACM symposium on User interface software and technology
Software Testing Techniques

Software Testing Techniques
Modern Information Retrieval

Modern Information Retrieval
Semantic anomaly detection in online data sources

Proceedings of the 24th International Conference on Software Engineering
An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing

IEEE Transactions on Software Engineering
First Steps in Programming: A Rationale for Attention Investment Models

HCC '02 Proceedings of the IEEE 2002 Symposia on Human Centric Computing Languages and Environments (HCC'02)
End-user software engineering

Communications of the ACM - End-user development: tools that empower users to create their own software solutions
Digital Family Portrait Field Trial: Support for Aging in Place

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Integrating automated test generation into the WYSIWYT spreadsheet testing methodology

ACM Transactions on Software Engineering and Methodology (TOSEM)
AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets

VLHCC '06 Proceedings of the Visual Languages and Human-Centric Computing
Active EM to reduce noise in activity recognition

Proceedings of the 12th international conference on Intelligent user interfaces
How it works: a field study of non-technical users interacting with an intelligent system

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Active Learning with Feedback on Features and Instances

The Journal of Machine Learning Research
Toward establishing trust in adaptive agents

Proceedings of the 13th international conference on Intelligent user interfaces
Fixing the program my computer learned: barriers for end users, challenges for the machine

Proceedings of the 14th international conference on Intelligent user interfaces
EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Why and why not explanations improve the intelligibility of context-aware intelligent systems

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Can feature design reduce the gender gap in end-user software development environments?

VLHCC '08 Proceedings of the 2008 IEEE Symposium on Visual Languages and Human-Centric Computing
Toolkit to support intelligibility in context-aware applications

Proceedings of the 12th ACM international conference on Ubiquitous computing
Explanatory Debugging: Supporting End-User Debugging of Machine-Learned Programs

VLHCC '10 Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing
End-user feature labeling: a locally-weighted regression approach

Proceedings of the 16th international conference on Intelligent user interfaces
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
How Programmers Debug, Revisited: An Information Foraging Theory Perspective

IEEE Transactions on Software Engineering

An explanation-centric approach for personalizing intelligent agents

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Intelligent assistants are handling increasingly critical tasks, but until now, end users have had no way to systematically assess where their assistants make mistakes. For some intelligent assistants, this is a serious problem: if the assistant is doing work that is important, such as assisting with qualitative research or monitoring an elderly parent's safety, the user may pay a high cost for unnoticed mistakes. This paper addresses the problem with WYSIWYT/ML (What You See Is What You Test for Machine Learning), a human/computer partnership that enables end users to systematically test intelligent assistants. Our empirical evaluation shows that WYSIWYT/ML helped end users find assistants' mistakes significantly more effectively than ad hoc testing. Not only did it allow users to assess an assistant's work on an average of 117 predictions in only 10 minutes, it also scaled to a much larger data set, assessing an assistant's work on 623 out of 1,448 predictions using only the users' original 10 minutes' testing effort.