The minimum code length for clustering using the gray code
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Machine Learning
Hi-index | 0.01 |
Mass estimation, an alternative to density estimation, has been shown recently to be an effective base modelling mechanism for three data mining tasks of regression, information retrieval and anomaly detection. This paper advances this work in two directions. First, we generalise the previously proposed one-dimensional mass estimation to multi-dimensional mass estimation, and significantly reduce the time complexity to $O(\psi h)$ from $O({\psi}^{h})—making it feasible for a full range of generic problems. Second, we introduce the first clustering method based on mass#x2014;it is unique because it does not employ any distance or density measure. The structure of the new mass model enables different parts of a cluster to be identified and merged without expensive evaluations. The characteristics of the new clustering method are: (i) it can identify arbitrary-shape clusters, (ii) it is significantly faster than existing density-based or distance-based methods, and (iii) it is noise-tolerant.