Algorithms for clustering high dimensional and distributed data

  • Authors:
  • Tao Li;Shenghuo Zhu;Mitsunori Ogihara

  • Affiliations:
  • (Correspd. Tel.: +1 585 275 8479/ Fax: +1 585 273 4556/ E-mail: taoli@cs.rochester.edu) Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA. E-mail: {taoli,zsh,ogih ...;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA. E-mail: {taoli,zsh,ogihara}@cs.rochester.edu;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA. E-mail: {taoli,zsh,ogihara}@cs.rochester.edu

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. The clustering problem has been widely studied in machine learning, databases, and statistics. This paper studies the problem of clustering high dimensional data. The paper proposes an algorithm called the CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the Maximum Likelihood Principle, CoFD attempts to optimize its parameter settings to maximize the likelihood between data points and the model generated by the parameters. The distributed versions of the problem, called the D-CoFD algorithms, are also proposed. Experimental results on both synthetic and real data sets show the efficiency and effectiveness of CoFD and D-CoFD algorithms.