C4.5: programs for machine learning
C4.5: programs for machine learning
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Geography of Differences between Two Classes of Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Structural geography of the space of emerging patterns
Intelligent Data Analysis
Hi-index | 0.00 |
Gene expression profiles and proteomic data are extremely high-dimensional data. Though support vector machines can well learn the inner relationship of the data for classification, the non-linear kernel functions pose an obstacle to explain the prediction reasons to non-specialists. We prefer to use rule-based methods due to their easy interpretability. In this paper, we first discuss feature space transformation. Each new feature (a rule) is a combination of multiple original features provided that the new feature captures a large percentage of a class of data, but with no occurrence in the other class. Under the description of new features, training or test data are clearly class-separable. Then we discuss a more sophisticated rule-based method, called PCL, for classification. PCL provides easily explainable classification scores for us to better understand the predictions and the test data themselves. Visualization is also used to enhance the understanding of the classifier output. We use rich examples to demonstrate our main points.