Feature space transformation and decision results interpretation

Authors:
Jinyan Li;Hwee-Leng Ong
Affiliations:
Laboratories for Information Technology, 21 Heng Mui Keng Terrace, Singapore;Laboratories for Information Technology, 21 Heng Mui Keng Terrace, Singapore
Venue:
APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Year:
2003

Citing 5
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Geography of Differences between Two Classes of Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery

Structural geography of the space of emerging patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gene expression profiles and proteomic data are extremely high-dimensional data. Though support vector machines can well learn the inner relationship of the data for classification, the non-linear kernel functions pose an obstacle to explain the prediction reasons to non-specialists. We prefer to use rule-based methods due to their easy interpretability. In this paper, we first discuss feature space transformation. Each new feature (a rule) is a combination of multiple original features provided that the new feature captures a large percentage of a class of data, but with no occurrence in the other class. Under the description of new features, training or test data are clearly class-separable. Then we discuss a more sophisticated rule-based method, called PCL, for classification. PCL provides easily explainable classification scores for us to better understand the predictions and the test data themselves. Visualization is also used to enhance the understanding of the classifier output. We use rich examples to demonstrate our main points.