Euler clustering

  • Authors:
  • Jian-Sheng Wu;Wei-Shi Zheng;Jian-Huang Lai

  • Affiliations:
  • School of Information Science and Technology, Sun Yat-sen University, Guangzhou, P.R. China and Guangdong Province Key Laboratory of Computational Science, Guangzhou, P.R. China;School of Information Science and Technology, Sun Yat-sen University, Guangzhou, P.R. China and Guangdong Province Key Laboratory of Computational Science, Guangzhou, P.R. China;School of Information Science and Technology, Sun Yat-sen University, Guangzhou, P.R. China

  • Venue:
  • IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

By always mapping data from lower dimensional space into higher or even infinite dimensional space, kernel k-means is able to organize data into groups when data of different clusters are not linearly separable. However, kernel k-means incurs the large scale computation due to the representation theorem, i.e. keeping an extremely large kernel matrix in memory when using popular Gaussian and spatial pyramid matching kernels, which largely limits its use for processing large scale data. Also, existing kernel clustering can be overfitted by outliers as well. In this paper, we introduce an Euler clustering, which can not only maintain the benefit of nonlinear modeling using kernel function but also significantly solve the large scale computational problem in kernel-based clustering. This is realized by incorporating Euler kernel. Euler kernel is relying on a nonlinear and robust cosine metric that is less sensitive to outliers. More important it intrinsically induces an empirical map which maps data onto a complex space of the same dimension. Euler clustering takes these advantages to measure the similarity between data in a robust way without increasing the dimensionality of data, and thus solves the large scale problem in kernel k-means. We evaluate Euler clustering and show its superiority against related methods on five publicly available datasets.