Mining Approximate Order Preserving Clusters in the Presence of Noise

  • Authors:
  • Mengsheng Zhang;Wei Wang;Jinze Liu

  • Affiliations:
  • Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3175 USA. mszhang@cs.unc.edu;Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3175 USA. weiwang@cs.unc.edu;Department of Computer Science, University of Kentucky, Lexington, KY 40506-0046 USA. liuj@cs.uky.edu

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression patterns can be represented by consistent orderings of attributes. Existing order preserving cluster models require all objects in a cluster have identical attribute order without deviation. However, real data are noisy due to measurement technology limitation and experimental variability which prohibits these strict models from revealing true clusters corrupted by noise. In this paper, we study the problem of revealing the order preserving clusters in the presence of noise. We propose a noise-tolerant model called approximate order preserving cluster (AOPC). Instead of requiring all objects in a cluster have identical attribute order, we require that (1) at least a certain fraction of the objects have identical attribute order; (2) other objects in the cluster may deviate from the consensus order by up to a certain fraction of attributes. We also propose an algorithm to mine AOPC. Experiments on gene expression data demonstrate the efficiency and effectiveness of our algorithm.