Finding regional co-location patterns for sets of continuous variables in spatial datasets
Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Towards region discovery in spatial datasets
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Discovery of interesting regions in spatial data sets using supervised clustering
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
MOSAIC: a proximity graph approach for agglomerative clustering
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Analyzing the composition of cities using spatial clustering
Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing
Hi-index | 0.00 |
Traditional regression analysis derives global relationships between variables and neglects spatial variations in variables. Hence they lack the ability to systematically discover regional relationships and to build better models that use this regional knowledge to obtain higher prediction accuracies. Since most relationships in spatial datasets are regional, there is a great need for regional regression methods that derive regional regression functions that reflect different spatial characteristics of different regions. This paper proposes a novel regional regression framework that first discovers interesting regions showing strong regional relationships between the dependent and the independent variables, and then builds a prediction model with a regional regression function associated with each region. Interesting regions are identified by running a representative-based clustering algorithm that maximizes an externally plugged in fitness function. In this work, we propose two fitness functions: an R-squared based fitness function and an AIC-based fitness function to handle overfitting better. We evaluate our framework in two case studies; (1) identifying causes of arsenic contamination in Texas water wells and (2) Boston Housing dataset determining spatially varying effects of house properties on house prices. We demonstrated that our framework effectively identifies interesting regions and builds better prediction systems that rely on regional models.