CLINCH: clustering incomplete high-dimensional data for data mining application

Authors:
Zunping Cheng;Ding Zhou;Chen Wang;Jiankui Guo;Wei Wang;Baokang Ding;Baile Shi
Affiliations:
Fudan University, China;Pennsylvania State University;Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China
Venue:
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Year:
2005

Citing 13
Cited 1

Statistical analysis with missing data

Statistical analysis with missing data
C4.5: programs for machine learning

C4.5: programs for machine learning
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Mining massively incomplete data sets by conceptual reconstruction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning from Incomplete Data

Learning from Incomplete Data

Consensus strategy for clustering using RC-images

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a common technique in data mining to discover hidden patterns from massive datasets. With the development of privacy-maintaining data mining application, clustering incomplete high-dimensional data has becoming more and more useful. Motivated by these limits, we develop a novel algorithm CLINCH, which could produce fine clusters on incomplete high-dimensional data space. To handle missing attributes, CLINCH employs a prediction method that can be more precise than traditional techniques. On the other hand, we also introduce an efficient way in which dimensions are processed one by one to attack the “curse of dimensionality”. Experiments show that our algorithm not only outperforms many existing high-dimensional clustering algorithms in scalability and efficiency, but also produces precise results.