Comparing predictive power in climate data: clustering matters

Authors:
Karsten Steinhaeuser;Nitesh V. Chawla;Auroop R. Ganguly
Affiliations:
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame IN and Oak Ridge National Laboratory, Oak Ridge TN;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame IN;Oak Ridge National Laboratory, Oak Ridge TN
Venue:
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Year:
2011

Citing 9
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Algorithm 97: Shortest path

Communications of the ACM
Using Multivariate Clustering to Characterize Ecoregion Borders

Computing in Science and Engineering
An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Discovery of climate indices using clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
An ensemble framework for clustering protein–protein interaction networks

Bioinformatics
An exploration of climate data using complex networks

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various clustering methods have been applied to climate, ecological, and other environmental datasets, for example to define climate zones, automate land-use classification, and similar tasks. Measuring the "goodness" of such clusters is generally application-dependent and highly subjective, often requiring domain expertise and/or validation with field data (which can be costly or even impossible to acquire). Here we focus on one particular task: the extraction of ocean climate indices from observed climatological data. In this case, it is possible to quantify the relative performance of different methods. Specifically, we propose to extract indices with complex networks constructed from climate data, which have been shown to effectively capture the dynamical behavior of the global climate system, and compare their predictive power to candidate indices obtained using other popular clustering methods. Our results demonstrate that network-based clusters are statistically significantly better predictors of land climate than any other clustering method, which could lead to a deeper understanding of climate processes and complement physics-based climate models.