Predicting the potential habitat of oaks with data mining models and the R system

Authors:
Rafael Pino-Mejías;María Dolores Cubiles-de-la-Vega;María Anaya-Romero;Antonio Pascual-Acosta;Antonio Jordán-López;Nicolás Bellinfante-Crocci
Affiliations:
Department of Statistics and Operational Research, University of Seville, Avda. Reina Mercedes s/n, Seville, Spain;Department of Statistics and Operational Research, University of Seville, Avda. Reina Mercedes s/n, Seville, Spain;Department of Cristallography, Mineralogy and Agricultural Chemistry, University of Seville, Avda. Reina Mercedes s/n, Seville, Spain;Andalusian Prospective Center, Avda. Reina Mercedes s/n, Seville, Spain;Department of Cristallography, Mineralogy and Agricultural Chemistry, University of Seville, Avda. Reina Mercedes s/n, Seville, Spain;Department of Cristallography, Mineralogy and Agricultural Chemistry, University of Seville, Avda. Reina Mercedes s/n, Seville, Spain
Venue:
Environmental Modelling & Software
Year:
2010

Citing 11
Cited 6

Introduction to the theory of neural computation

Introduction to the theory of neural computation
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Random Forests

Machine Learning
An Internet-based decision support tool for non-industrial private forest landowners

Environmental Modelling & Software
Machine learning methods for microbial source tracking

Environmental Modelling & Software
Non-linear variable selection for artificial neural networks using partial mutual information

Environmental Modelling & Software
Searching for simplified farmers' crop choice models for integrated watershed management in Thailand: A data mining approach

Environmental Modelling & Software
Short Communication: GESCONDA: An intelligent data analysis system for knowledge discovery and management in environmental databases

Environmental Modelling & Software
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques

Data-driven fuzzy habitat suitability models for brown trout in Spanish Mediterranean rivers

Environmental Modelling & Software
Software, Data and Modelling News: Outcomes from the iEMSs data mining in the environmental sciences workshop series

Environmental Modelling & Software
Software, data and modelling news: imageRF - A user-oriented implementation for remote sensing image analysis with Random Forests

Environmental Modelling & Software
Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy

Computers & Geosciences
Short communication: Habitat prediction and knowledge extraction for spawning European grayling (Thymallus thymallus L.) using a broad range of species distribution models

Environmental Modelling & Software
Data visualization and analysis within a Hydrologic Information System: Integrating with the R statistical computing environment

Environmental Modelling & Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Oak forests are essential for the ecosystems of many countries, particularly when they are used in vegetal restoration. Therefore, models for predicting the potential habitat of oaks can be a valuable tool for work in the environment. In accordance with this objective, the building and comparison of data mining models are presented for the prediction of potential habitats for the oak forest type in Mediterranean areas (southern Spain), with conclusions applicable to other regions. Thirty-one environmental input variables were measured and six base models for supervised classification problems were selected: linear and quadratic discriminant analysis, logistic regression, classification trees, neural networks and support vector machines. Three ensemble methods, based on the combination of classification tree models fitted from samples and sets of variables generated from the original data set were also evaluated: bagging, random forests and boosting. The available data set was randomly split into three parts: training set (50%), validation set (25%), and test set (25%). The analysis of the accuracy, the sensitivity, the specificity, together with the area under the ROC curve for the test set reveal that the best models for our oak data set are those of bagging and random forests. All of these models can be fitted by free R programs which use the libraries and functions described in this paper. Furthermore, the methodology used in this study will allow researchers to determine the potential distribution of oaks in other kinds of areas.