Discovery of feature-based hot spots using supervised clustering

  • Authors:
  • Wei Ding;Tomasz F. Stepinski;Rachana Parmar;Dan Jiang;Christoph F. Eick

  • Affiliations:
  • Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125-3393, USA;Lunar and Planetary Institute, 3600 Bay Area Blvd., Houston, TX 77058, USA;Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA;Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA;Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA

  • Venue:
  • Computers & Geosciences
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

Feature-based hot spots are localized regions where the attributes of objects attain high values. There is considerable interest in automatic identification of feature-based hot spots. This paper approaches the problem of finding feature-based hot spots from a data mining perspective, and describes a method that relies on supervised clustering to produce a list of hot spot regions. Supervised clustering uses a fitness function rewarding isolation of the hot spots to optimally subdivide the dataset. The clusters in the optimal division are ranked using the interestingness of clusters that encapsulate their utility for being hot spots. Hot spots are associated with the top ranked clusters. The effectiveness of supervised clustering as a hot spot identification method is evaluated for four conceptually different clustering algorithms using a dataset describing the spatial distribution of ground ice on Mars. Clustering solutions are visualized by specially developed raster approximations. Further assessment of the ability of different algorithms to yield hot spots is performed using raster approximations. Density-based clustering algorithm is found to be the most effective for hot spot identification. The results of the hot spot discovery by supervised clustering are comparable to those obtained using the G^* statistic, but the new method offers a high degree of automation, making it an ideal tool for mining large datasets for the existence of potential hot spots.