Toward Exploratory Test-Instance-Centered Diagnosis in High-Dimensional Classification

Authors:
Charu C. Aggarwal
Affiliations:
IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 24
Cited 1

Classification algorithms

Classification algorithms
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Visual classification: an interactive approach to decision tree construction

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards an effective cooperation of the user and the computer for classification

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Lazy Learning of Bayesian Rules

Machine Learning
A human-computer cooperative system for effective high dimensional clustering

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
HD-Eye: Visual Mining of High-Dimensional Data

IEEE Computer Graphics and Applications
Constraint-Based, Multidimensional Data Mining

Computer
The CN2 Induction Algorithm

Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Lightweight Rule Induction

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Multivariate Decision Trees

Multivariate Decision Trees
Simplifying decision trees: A survey

The Knowledge Engineering Review
Towards exploratory test instance specific algorithms for high dimensional classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A Recursive Partitioning Decision Rule for Nonparametric Classification

IEEE Transactions on Computers
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

A granular agent evolutionary algorithm for classification

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-dimensional data is a difficult case for most subspace-based classification methods because of the large number of combinations of dimensions, which have discriminatory power. This is because there are an exponential number of combinations of dimensions that could decide the correct class instance, and this combination could vary with data locality and test instance. Therefore, most summarized models such as decision trees and rule-based systems only aim to have a global summary of the data, which is used for classification. Because of this incompleteness, a particular classification model may be more or less suited to individual test instances. Furthermore, it may not provide sufficient insight into the most representative characteristics of a particular test instance. This is undesirable for many classification applications in which the diagnostic reasoning behind the classification of a test instance is as important as the classification process itself. In an interactive application, a user may find it more valuable to develop a diagnostic decision support method, which can reveal significant classification behaviors of exemplar records. Such an approach has the additional advantage of being able to optimize the decision process for the individual record in order to design more effective classification methods. In this paper, we propose the Subspace Decision Path (SD-Path) method, which provides the user with the ability to interactively explore a small number of nodes of a hierarchical decision process so that the most significant classification characteristics for a given test instance are revealed. In addition, the SD-Path method can provide enormous interpretability by constructing views of the data in which the different classes are clearly separated out. Even in difficult cases where the classification behavior of the test instance is ambiguous, the SD-Path method provides a diagnostic understanding of the characteristics, which results in this ambiguity. Therefore, this method combines the abilities of the human and the computer in creating an effective diagnostic tool for instance-centered high-dimensional classification.