A methodological approach to mining and simulating data in complex information systems

  • Authors:
  • Marina V. Sokolova;Antonio Fernández-Caballero

  • Affiliations:
  • Instituto de Investigación en Informática de Albacete i3A and Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha, Albacete, Spain;Instituto de Investigación en Informática de Albacete i3A and Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha, Albacete, Spain

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Complex emergent systems are known to be ill-managed because of their complex nature. This article introduces a novel interdisciplinary approach towards their study. In this sense, the DeciMaS methodological approach to mining and simulating data in complex information systems is introduced. The DeciMaS framework consists of three principal phases, preliminary domain and system analysis, system design and coding, and simulation and decision making. The framework offers a sequence of steps in order to support a domain expert who is not a specialist in data mining during the knowledge discovery process. With this aim a generalized structure of a decision support system DSS has been worked out. The DSS is virtually and logically organized into a three-leveled architecture. The first layer is dedicated to data retrieval, fusion and pre-processing, the second one discovers knowledge from data, and the third layer deals with making decisions and generating output information. Data mining is aimed to solve the following problems: association, classification, function approximation, and clustering. DeciMaS populates the second logical level of the DSS with agents which are aimed to complete these tasks. The agents use a wide range of data mining procedures that include approaches for estimation and prediction: regression analysis, artificial networks ANNs, self-organizational methods, in particular, Group Method of Data Handling, and hybrid methods. The association task is solved with artificial neural networks. The ANNs are trained with different training algorithms such as backpropagation, resilient propagation and genetic algorithms. In order to assess the proposal an exhaustive experiment, designed to evaluate the possible harm caused by environmental contamination upon public health, is introduced in detail.