HARP: A Practical Projected Clustering Algorithm

Authors:
Kevin Y. Yip;David W. Cheung;Michael K. Ng
Affiliations:
-;IEEE Computer Society;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 14
Cited 27

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering gene expression patterns

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast hierarchical clustering and other applications of dynamic closest pairs

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Clustering Algorithms

Clustering Algorithms
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

On Discovery of Extremely Low-Dimensional Clusters Using Semi-Supervised Projected Clustering

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
Strategies for Identifying Statistically Significant Dense Regions in Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
APPLYING DATA MINING TECHNIQUES FOR CANCER CLASSIFICATION ON GENE EXPRESSION DATA

Cybernetics and Systems
An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Query result clustering for object-level search

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A semi-supervised approach to projected clustering with applications to microarray data

International Journal of Data Mining and Bioinformatics
Enhanced soft subspace clustering integrating within-cluster and between-cluster information

Pattern Recognition
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
SKM-SNP: SNP markers detection method

Journal of Biomedical Informatics
A fast algorithm for finding correlation clusters in noise data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining quality-aware subspace clusters

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Comparative analysis of biclustering algorithms

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
A novel attribute weighting algorithm for clustering high-dimensional categorical data

Pattern Recognition
A feature group weighting method for subspace clustering of high-dimensional data

Pattern Recognition
EEW-SC: Enhanced Entropy-Weighting Subspace Clustering for high dimensional gene expression data clustering analysis

Applied Soft Computing
A robust seedless algorithm for correlation clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Projective clustering ensembles

Data Mining and Knowledge Discovery
A weighting k-modes algorithm for subspace clustering of categorical data

Neurocomputing
Fuzzy partition based soft subspace clustering and its applications in high dimensional data

Information Sciences: an International Journal
Hybrid entity clustering using crowds and data

The VLDB Journal — The International Journal on Very Large Data Bases
Semi-supervised projected model-based clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications.