Multi-dimensional Mass Estimation and Mass-based Clustering

  • Authors:
  • Kai Ming Ting;Jonathan R. Wells

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Mass estimation, an alternative to density estimation, has been shown recently to be an effective base modelling mechanism for three data mining tasks of regression, information retrieval and anomaly detection. This paper advances this work in two directions. First, we generalise the previously proposed one-dimensional mass estimation to multi-dimensional mass estimation, and significantly reduce the time complexity to $O(\psi h)$ from $O({\psi}^{h})—making it feasible for a full range of generic problems. Second, we introduce the first clustering method based on mass#x2014;it is unique because it does not employ any distance or density measure. The structure of the new mass model enables different parts of a cluster to be identified and merged without expensive evaluations. The characteristics of the new clustering method are: (i) it can identify arbitrary-shape clusters, (ii) it is significantly faster than existing density-based or distance-based methods, and (iii) it is noise-tolerant.