An introduction to symbolic data analysis and the SODAS software

Authors:
Edwin Diday;Floriana Esposito
Affiliations:
University Paris 9 Dauphine, Ceremade. Pl. Du Mle de L. de Tassigny, 75016 Paris, France;Università di Bari, Dipartimento di Informatica v. Orabona 4 70125 Bari, Italy
Venue:
Intelligent Data Analysis
Year:
2003

Citing 5
Cited 9

Symbolic clustering using a new dissimilarity measure

Pattern Recognition
Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Order Structure of Symbolic Assertion Objects

IEEE Transactions on Knowledge and Data Engineering
Some inequalities relating different measures of divergence between two probability distributions

IEEE Transactions on Information Theory
Renyi's divergence and entropy rates for finite alphabet Markov sources

IEEE Transactions on Information Theory

Mining changing regions from access-constrained snapshots: a cluster-embedded decision tree approach

Journal of Intelligent Information Systems
Spatial classification

Discrete Applied Mathematics
A Grey-Rough Set Approach for Interval Data Reduction of Attributes

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Classification of symbolic objects: A lazy learning approach

Intelligent Data Analysis - Analysis of Symbolic and Spatial Data
Mining Physiological Data for Discovering Temporal Patterns on Disease Stages

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Rough set approximations in formal concept analysis

Transactions on rough sets XII
A relational approach for discovering frequent patterns with disjunctions

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
A temporal data mining framework for analyzing longitudinal data

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Fuzzy Kohonen clustering networks for interval data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data descriptions of the units are called "symbolic" when they are more complex than standard ones, due to the fact that they contain internal variations and are structured. Symbolic data arise from many sources, for instance when summarizing huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means obtaining explanatory results, and for this reason, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover, they can be used to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. Any SDA is based on four spaces: the space of individuals, the space of concepts, the space of descriptions modelling individuals or classes of individuals, the space of symbolic objects modelling concepts. New problems arise from these four spaces, such as the quality, robustness and reliability of the approximation of a concept given by a symbolic object, the symbolic description of a class, the consensus between symbolic descriptions, and so on. In this paper we give an overview of recent developments in SDA. We briefly describe some SDA tools and methods and, in particular, we describe some dissimilarity methods for symbolic objects which are central to the majority of symbolic data analysis methods. Finally, we introduce the software prototype, developed by 17 teams from nine countries involved in the SODAS EUROSTAT project.