Comparing relational and non-relational algorithms for clustering propositional data

  • Authors:
  • Robson Motta;Alneu de Andrade Lopes;Bruno M. Nogueira;Solange O. Rezende;Alípio M. Jorge;Maria Cristina Ferreira de Oliveira

  • Affiliations:
  • VICG, ICMC, University of Sao Paulo, Sao Carlos, SP, Brazil;LABIC, ICMC, University of Sao Paulo, Sao Carlos, SP, Brazil;LABIC, ICMC, University of Sao Paulo, Sao Carlos, SP, Brazil;LABIC, ICMC, University of Sao Paulo, Sao Carlos, SP, Brazil;LIAAD - INESC TEC, DCC, FCUP, University of Porto, Portugal;VICG, ICMC, University of Sao Paulo, Sao Carlos, SP, Brazil

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster detection methods are widely studied in Propositional Data Mining. In this context, data is individually represented as a feature vector. This data has a natural non-relational structure, but can be represented in a relational form through similarity-based network models. In these models, examples are represented by vertices and an edge connects two examples with high similarity. This relational representation allows employing network-based algorithms in Relational Data Mining. Specifically in clustering tasks, these models allow to use community detection algorithms in networks in order to detect data clusters. In this work, we compared traditional non-relational data-based clustering algorithms with clustering detection algorithms based on relational data using measures for community detection in networks. We carried out an exploratory analysis over 23 numerical datasets and 10 textual datasets. Results show that network models can efficiently represent the data topology, allowing their application in cluster detection with higher precision when compared to non-relational methods.