REG^2: a regional regression framework for geo-referenced datasets

  • Authors:
  • Oner Ulvi Celepcikay;Christoph F. Eick

  • Affiliations:
  • University of Houston, Houston, TX;University of Houston, Houston, TX

  • Venue:
  • Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional regression analysis derives global relationships between variables and neglects spatial variations in variables. Hence they lack the ability to systematically discover regional relationships and to build better models that use this regional knowledge to obtain higher prediction accuracies. Since most relationships in spatial datasets are regional, there is a great need for regional regression methods that derive regional regression functions that reflect different spatial characteristics of different regions. This paper proposes a novel regional regression framework that first discovers interesting regions showing strong regional relationships between the dependent and the independent variables, and then builds a prediction model with a regional regression function associated with each region. Interesting regions are identified by running a representative-based clustering algorithm that maximizes an externally plugged in fitness function. In this work, we propose two fitness functions: an R-squared based fitness function and an AIC-based fitness function to handle overfitting better. We evaluate our framework in two case studies; (1) identifying causes of arsenic contamination in Texas water wells and (2) Boston Housing dataset determining spatially varying effects of house properties on house prices. We demonstrated that our framework effectively identifies interesting regions and builds better prediction systems that rely on regional models.