Inductive learning from numerical and symbolic data: An integrated framework

  • Authors:
  • Floriana Esposito;Donato Malerba;Vittorio Marengo

  • Affiliations:
  • Department of Informatics, University of Bari, Italy. E-mail: {esposito,malerba}@di.uniba.it;Department of Informatics, University of Bari, Italy. E-mail: {esposito,malerba}@di.uniba.it;Department of Statistics, University of Bari, Italy. E-mail: vmarengo@dss.uniba.it

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Numerical induction from data is one of the statistical data analysis tasks, which uses a tabular model, with almost exclusively numerical features, as data representation formalism. The output representations are different: from functions to probability distributions, from surface equations to tables of indexes. One approach to extend the classical data analysis techniques to symbolic objects is the Symbolic Data Analysis: the input and the output of classical techniques are expressed in a symbolic way, so guaranteeing the comprehensibility of both the observations and the results, while the processing techniques, although appropriately adapted, maintain the efficiency of the classical statistical inferential models. Also in the field of Machine Learning several methods have been proposed to extend some inductive approaches from statistical data analysis to data represented as attribute-value couples. Sometimes these approaches transform ideas and principles coming from numerical induction to handle propositional calculus descriptions, otherwise they combine different techniques in order to treat numerical-continuous data and algebraic-symbolic data differently. The aim here is to improve the efficiency and to preserve the expressive power of the representations during the learning process, and to save the accuracy and flexibility of the numerical techniques during the recognition phase. This kind of integration is more and more complex when using first-order computational learning models, which are useful for handling object descriptions in structured domains, when not only the properties of objects but also the relations between different objects must be considered. The necessity arises from integrating different computational strategies, different knowledge representations and different processing methods in a naive combination of classifiers or, more meaningfully, in a real integration within a unique theoretical framework. When building machine learning systems increasing attention is given to handling symbolic and numerical information inside the same system. In the paper, we face the problem of handling both numerical and symbolic data in first-order models, distinguishing the phase of model generation from examples, and the phase of model recognition by means of a flexible probabilistic subsumption test.