SSDT: A Scalable Subspace-Splitting Classifier for Biased Data

Authors:
Haixun Wang;Philip S. Yu
Affiliations:
-;-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 2

Modeling (in)variability of human judgments for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Supervised ranking in open-domain text summarization

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision trees are one of the most extensively used data mining models. Recently, a number of efficient, scalable algorithms for constructing decision trees on large disk-resident dataset have been introduced. In this paper, we study the problem of learning scalable decision trees from datasets with biased class distribution. Our objective is to build decision trees that are ore concise and oreinterpretable while maintaining the scalability of the model.To achieve this, our approach searches for subspace clusters of data cases of the biased class to enable multivariate splittings based on weighted distances to such clusters. In orderto build concise and interpretable models, other approaches including multivariate decision trees and association rules, often introduce scalability and performance issues. The SSDT algorithm we present achieves the objective without loss in efficiency, scalability, and accuracy.