Towards improving subspace data analysis

  • Authors:
  • Yong Shi

  • Affiliations:
  • Kennesaw State University, Kennesaw, GA

  • Venue:
  • Proceedings of the 48th Annual Southeast Regional Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present continuous research on data analysis based on our previous work on cluster-outlier iterative detection approach in subspace. Based on the observation that, for noisy data sets, clusters and outliers can not be processed efficiently when they are handled separately from each other, we proposed a cluster-outlier iterative detection algorithm in full data space in our previous work [22]. Due to the fact that the real data sets normally have high dimensionality, and natural clusters and outliers do not exist in the full data space, we proposed an algorithm (SubCOID) to detect clusters and outliers in subspace [21]. However, it is not a trivial task to associate each cluster and each outlier with different subsets of dimensions. In this paper, we present the improved SubCOID algorithm, applying some novel approach to choosing a unique subset of dimensions for each cluster and each outlier. The selection is based on the intra-relationship within clusters, the intra-relationship within outliers, and the inter-relationship between clusters and outliers. This process is performed iteratively until a certain termination condition is reached. This data processing algorithm can be applied in many fields such as pattern recognition, data clustering and signal processing.