An iterative strategy for pattern discovery in high-dimensional data sets

Authors:
Chun Tang;Aidong Zhang
Affiliations:
State University of New York at Buffalo, Buffalo, NY;State University of New York at Buffalo, Buffalo, NY
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 10
Cited 4

Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data mining: building competitive advantage

Data mining: building competitive advantage
Data mining: concepts and techniques

Data mining: concepts and techniques
Class discovery in gene expression data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Analysis of gene expression profiles: class discovery and leaf ordering

Proceedings of the sixth annual international conference on Computational biology
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Graph partitioning for high-performance scientific simulations

Sourcebook of parallel computing

Mining multiple phenotype structures underlying gene expression profiles

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Frequent pattern discovery in online environment

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-dimensional data representation in which each data item (termed target object) is described by many features, is a necessary component of many applications. For example, in DNA microarrays, each sample (target object) is represented by thousands of genes as features. Pattern discovery of target objects presents interesting but also very challenging problems. The data sets are typically not task-specific, many features are irrelevant or redundant and should be pruned out or filtered for the purpose of classifying target objects to find empirical pattern. Uncertainty about which features are relevant makes it difficult to construct an informative feature space. This paper proposes an iterative strategy for pattern discovery in high-dimensional data sets. In this approach, the iterative process consists of two interactive components: discovering patterns within target objects and pruning irrelevant features. The performance of the proposed method with various real data sets is also illustrated.