CLINCH: clustering incomplete high-dimensional data for data mining application

  • Authors:
  • Zunping Cheng;Ding Zhou;Chen Wang;Jiankui Guo;Wei Wang;Baokang Ding;Baile Shi

  • Affiliations:
  • Fudan University, China;Pennsylvania State University;Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China

  • Venue:
  • APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a common technique in data mining to discover hidden patterns from massive datasets. With the development of privacy-maintaining data mining application, clustering incomplete high-dimensional data has becoming more and more useful. Motivated by these limits, we develop a novel algorithm CLINCH, which could produce fine clusters on incomplete high-dimensional data space. To handle missing attributes, CLINCH employs a prediction method that can be more precise than traditional techniques. On the other hand, we also introduce an efficient way in which dimensions are processed one by one to attack the “curse of dimensionality”. Experiments show that our algorithm not only outperforms many existing high-dimensional clustering algorithms in scalability and efficiency, but also produces precise results.