SubClass: classification of multidimensional noisy data using subspace clusters

  • Authors:
  • Ira Assent;Ralph Krieger;Petra Welter;Jörg Herbers;Thomas Seidl

  • Affiliations:
  • Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany;Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany;Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany;INFORM GmbH, Aachen, Germany;Data Management and Exploration Group, RWTH Aachen University, Aachen, Germany

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification has been widely studied and successfully employed in various application domains. In multidimensional noisy settings, however, classification accuracy may be unsatisfactory. Locally irrelevant attributes often occlude class-relevant information. A global reduction to relevant attributes is often infeasible, as relevance of attributes is not necessarily a globally uniform property. In a current project with an airport scheduling software company, locally varying attributes in the data indicate whether flights will be on time, delayed or ahead of schedule. To detect locally relevant information, we propose combining classification with subspace clustering (SubClass). Subspace clustering aims at detecting clusters in arbitrary subspaces of the attributes. It has proved to work well in multidimensional and noisy domains. However, it does not utilize class label information and thus does not necessarily provide appropriate groupings for classification. We propose incorporating class label information into subspace search. As a result we obtain locally relevant attribute combinations for classification. We present the SubClass classifier that successfully exploits classifying subspace cluster information. Experiments on both synthetic and real world datasets demonstrate that classification accuracy is clearly improved for noisy multidimensional settings.