REG^2: a regional regression framework for geo-referenced datasets

Authors:
Oner Ulvi Celepcikay;Christoph F. Eick
Affiliations:
University of Houston, Houston, TX;University of Houston, Houston, TX
Venue:
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Year:
2009

Citing 4
Cited 1

Finding regional co-location patterns for sets of continuous variables in spatial datasets

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Towards region discovery in spatial datasets

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Discovery of interesting regions in spatial data sets using supervised clustering

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
MOSAIC: a proximity graph approach for agglomerative clustering

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Analyzing the composition of cities using spatial clustering

Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional regression analysis derives global relationships between variables and neglects spatial variations in variables. Hence they lack the ability to systematically discover regional relationships and to build better models that use this regional knowledge to obtain higher prediction accuracies. Since most relationships in spatial datasets are regional, there is a great need for regional regression methods that derive regional regression functions that reflect different spatial characteristics of different regions. This paper proposes a novel regional regression framework that first discovers interesting regions showing strong regional relationships between the dependent and the independent variables, and then builds a prediction model with a regional regression function associated with each region. Interesting regions are identified by running a representative-based clustering algorithm that maximizes an externally plugged in fitness function. In this work, we propose two fitness functions: an R-squared based fitness function and an AIC-based fitness function to handle overfitting better. We evaluate our framework in two case studies; (1) identifying causes of arsenic contamination in Texas water wells and (2) Boston Housing dataset determining spatially varying effects of house properties on house prices. We demonstrated that our framework effectively identifies interesting regions and builds better prediction systems that rely on regional models.