An iterative strategy for pattern discovery in high-dimensional data sets

  • Authors:
  • Chun Tang;Aidong Zhang

  • Affiliations:
  • State University of New York at Buffalo, Buffalo, NY;State University of New York at Buffalo, Buffalo, NY

  • Venue:
  • Proceedings of the eleventh international conference on Information and knowledge management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-dimensional data representation in which each data item (termed target object) is described by many features, is a necessary component of many applications. For example, in DNA microarrays, each sample (target object) is represented by thousands of genes as features. Pattern discovery of target objects presents interesting but also very challenging problems. The data sets are typically not task-specific, many features are irrelevant or redundant and should be pruned out or filtered for the purpose of classifying target objects to find empirical pattern. Uncertainty about which features are relevant makes it difficult to construct an informative feature space. This paper proposes an iterative strategy for pattern discovery in high-dimensional data sets. In this approach, the iterative process consists of two interactive components: discovering patterns within target objects and pruning irrelevant features. The performance of the proposed method with various real data sets is also illustrated.