Integrated approach for the exploration of geospatial datasets: the interaction of concepts, methods and data

  • Authors:
  • Xiping Dai

  • Affiliations:
  • The Pennsylvania State University

  • Venue:
  • Integrated approach for the exploration of geospatial datasets: the interaction of concepts, methods and data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Categories are the basic concepts and building blocks for human knowledge in understanding, exploring and describing the world. Geographers use categories to conceptualize, interpret and communicate phenomena, such as land cover, urbanization, regions of economic growth, etc. Inductive machine-learning enabled by computers has proved to be a powerful tool for categorization in increasingly complex geographical datasets. Machine-learning tools are able to locate clusters of patterns or partitions in the space constructed from variables, which, however, are often hard for humans to understand and interpret. Furthermore, the data-driven clusters or patterns provided as results are not guaranteed to be appropriate in terms of “information classes” which are meaningful and important to users. This research provides a two-part solution to this problem. In the first part, a cognitive model of category development is synthesized to emphasize the integration of data-driven and theory/knowledge-based categories. Inductive learning from data examples provides data-driven classes or clusters. Human knowledge is employed in supervised category development, and develops generalized category models for GIS communities. The construction of categories is a problem of the integration between “information classes” (categories) and data-driven classes provided by machine learning. This model provides an integrated approach combining machine learning and human expert knowledge, and then improves communication, representation and sharing of categories during their development. In the second part, the integrated category development model is structured and optimized by combining visualization techniques and exploratory statistics with machine-learning tools. The visualization interface enables preprocessing of examples, facilitates an examination of the uncertainty in category design, and allows users to visually explore feature space. The combination of visualization and machine-learning supports the construction of categories, including the exploration for appropriate methods in category development, e.g. choosing between different types of classifiers, selection of appropriate training examples, rejection of outliers. This combined method of category development incorporates human expertise into the data-driven machine-learning by the visual interface, which allows controls on machine-learning tools and coordination among tools. The communications between the examples, description, categories and human expertise are, thus, enhanced and an integrated category development system is achieved. The integrated category development model is implemented as a series of visual and computation components and connected into a workflow design using GeoVISTA Studio.