Iterative Data Squashing for Boosting Based on a Distribution-Sensitive Distance

Authors:
Yuta Choki;Einoshin Suzuki
Affiliations:
-;-
Venue:
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2002

Citing 6
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Squashing flat files flatter

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards scalable support vector machines using squashing

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
Numerical Recipes in C: The Art of Scientific Computing

Numerical Recipes in C: The Art of Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method. compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.