Explaining data-driven document classifications

Authors:
David Martens;Foster Provost
Affiliations:
Department of Engineering Management, Faculty of Applied Economics, University of Antwerp, Antwerp, Belgium;Department of Information, Operations and Management Sciences, Stern School of Business, New York University, New York, NY
Venue:
MIS Quarterly
Year:
2014

Citing 39
Cited 0

Decisional guidance for computer-based decision support

MIS Quarterly
C4.5: programs for machine learning

C4.5: programs for machine learning
The impact of explanation facilities on user acceptance of expert systems advice

MIS Quarterly
The nature of statistical learning theory

The nature of statistical learning theory
Neural networks for pattern recognition

Neural networks for pattern recognition
Explanations from intelligent systems: theoretical foundations and implications for practice

MIS Quarterly
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Evaluating the Impact of Dss, Cognitive Effort, and Incentives on Strategy Selection

Information Systems Research
Providing Decisional Guidance for Multicriteria Decision Making in Groups

Information Systems Research
50th Anniversary Article: The Evolution of Research on Information Systems: A Fiftieth-Year Survey of the Literature in Management Science

Management Science
DSS Effectiveness in Marketing Resource Allocation Decisions: Reality vs. Perception

Information Systems Research
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence)

Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence)
The effects of structural characteristics of explanations on use of a DSS

Decision Support Systems
Text mining techniques for patent analysis

Information Processing and Management: an International Journal
Explaining Classifications For Individual Instances

IEEE Transactions on Knowledge and Data Engineering
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Document-Word Co-regularization for Semi-supervised Sentiment Analysis

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Audience selection for on-line brand advertising: privacy-friendly social network targeting

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Explaining instance classifications with interactions of subsets of feature values

Data & Knowledge Engineering
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research
How Incorporating Feedback Mechanisms in a DSS Affects DSS Evaluations

Information Systems Research
An Efficient Explanation of Individual Classifications using Game Theory

The Journal of Machine Learning Research
Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
How to Explain Individual Classification Decisions

The Journal of Machine Learning Research
Interpersonal conflict and its management in information system development

MIS Quarterly
Design science in information systems research

MIS Quarterly
The differential use and effect of knowledgebased system explanations in novice and expert judgment decisions

MIS Quarterly
Uncovering the intellectual core of the information systems discipline

MIS Quarterly
Predictive analytics in information systems research

MIS Quarterly
Design principles of massive, robust prediction systems

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many document classification applications require human understanding of the reasons for data-driven classification decisions by managers, client-facing employees, and the technical team. Predictive models treat documents as data to be classified, and document data are characterized by very high dimensionality, often with tens of thousands to millions of variables (words). Unfortunately, due to the high dimensionality, understanding the decisions made by document classifiers is very difficult. This paper begins by extending the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements. The main theoretical contribution is the definition of a new sort of explanation as a minimal set of words (terms, generally), such that removing all words within this set from the document changes the predicted class from the class of interest. We present an algorithm to find such explanations, as well as a framework to assess such an algorithm's performance. We demonstrate the value of the new approach with a case study from a real-world document classification task: classifying web pages as containing objectionable content, with the goal of allowing advertisers to choose not to have their ads appear on those pages. A second empirical demonstration on news-story topic classification shows the explanations to be concise and document-specific, and to be capable of providing understanding of the exact reasons for the classification decisions, of the workings of the classification models, and of the business application itself. We also illustrate how explaining the classifications of documents can help to improve data quality and model performance.