SubClass: classification of multidimensional noisy data using subspace clusters

Authors:
Ira Assent;Ralph Krieger;Petra Welter;Jörg Herbers;Thomas Seidl
Affiliations:
Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany;Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany;Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany;INFORM GmbH, Aachen, Germany;Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany
Venue:
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2008

Citing 14
Cited 1

Entropy and information theory

Entropy and information theory
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Locally Adaptive Metric Nearest-Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Spatial Multidimensional Sequence Clustering

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
DUSC: Dimensionality Unbiased Subspace Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Deriving class association rules based on levelwise subspace clustering

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Designing an ensemble classifier over subspace classifiers using iterative convergence routine

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification has been widely studied and successfully employed in various application domains. In multidimensional noisy settings, however, classification accuracy may be unsatisfactory. Locally irrelevant attributes often occlude class-relevant information. A global reduction to relevant attributes is often infeasible, as relevance of attributes is not necessarily a globally uniform property. In a current project with an airport scheduling software company, locally varying attributes in the data indicate whether flights will be on time, delayed or ahead of schedule. To detect locally relevant information, we propose combining classification with subspace clustering (SubClass). Subspace clustering aims at detecting clusters in arbitrary subspaces of the attributes. It has proved to work well in multidimensional and noisy domains. However, it does not utilize class label information and thus does not necessarily provide appropriate groupings for classification. We propose incorporating class label information into subspace search. As a result we obtain locally relevant attribute combinations for classification. We present the SubClass classifier that successfully exploits classifying subspace cluster information. Experiments on both synthetic and real world datasets demonstrate that classification accuracy is clearly improved for noisy multidimensional settings.