CART algorithm for spatial data: Application to environmental and ecological data

Authors:
L. Bel;D. Allard;J. M. Laurent;R. Cheddadi;A. Bar-Hen
Affiliations:
UMR 518 AgroParisTech/INRA, 16 rue Claude Bernard, 75231 PARIS Cedex 05, France;Unité Biostatistique & Processus Spatiaux, INRA, 84914 Avignon, France;Institut des Sciences de l'ívolution, CP 61, CNRS UMR 5554, 34095 Montpellier, France;Institut des Sciences de l'ívolution, CP 61, CNRS UMR 5554, 34095 Montpellier, France;Université Paris Descartes, Paris 5, UMR CNRS 8145 MAP5, 45 rue des Saints Pères, 75270 Paris cedex 06, France
Venue:
Computational Statistics & Data Analysis
Year:
2009

Citing 2
Cited 4

Spatial tessellations: concepts and applications of Voronoi diagrams

Spatial tessellations: concepts and applications of Voronoi diagrams
Boosting and instability for regression trees

Computational Statistics & Data Analysis

A similarity measure to assess the stability of classification trees

Computational Statistics & Data Analysis
Editorial: Spatial statistics: Methods, models & computation

Computational Statistics & Data Analysis
Higher-order co-occurrences for exploratory point pattern analysis and decision tree clustering on spatial data

Computers & Geosciences
Global and local spatial autocorrelation in predictive clustering trees

DS'11 Proceedings of the 14th international conference on Discovery science

Quantified Score

Hi-index	0.03

Visualization

Abstract

Most statistical learning techniques such as Classification And Regression Trees (CART) assume independent samples to compute classification rules. This assumption is very practical for estimating quantities involved in the algorithm and for assessing asymptotic properties of estimators. In many environmental or ecological applications, the data under study are a sample of some regionalized variables, which can be modeled as random fields with spatial dependence. When the sampling scheme is very irregular, a direct application of supervised classification algorithms leads to biased discriminant rules due, for example, to the possible oversampling of some areas. The CART algorithm is adapted to the case of spatially dependent samples, focusing on environmental and ecological applications. Two approaches are considered. The first one takes into account the irregularity of the sampling by weighting the data according to their spatial pattern using two existing methods based on Voronoi tessellation and regular grid, and one original method based on kriging. The second one uses spatial estimates of the quantities involved in the construction of the discriminant rule at each step of the algorithm. These methods are tested on simulations and on a classical dataset to highlight their advantages and drawbacks. They are then applied on an ecological data set to explore the relationship between pollen data and presence/absence of tree species, which is an important question for climate reconstruction based on paleoecological data.