Constructing and mapping fuzzy thematic clusters to higher ranks in a taxonomy

  • Authors:
  • Boris Mirkin;Susana Nascimento;Trevor Fenner;Luís Moniz Pereira

  • Affiliations:
  • School of Computer Science, Birkbeck University of London, London, UK and Division of Applied Mathematics, Higher School of Economics, Moscow, RF;Computer Science Department and Centre for Artificial Intelligence, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal;School of Computer Science, Birkbeck University of London, London, UK;Computer Science Department and Centre for Artificial Intelligence, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal

  • Venue:
  • KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel methodology for mapping a system such as a research department to a related taxonomy in a thematically consistent way. The components of the structure are supplied with fuzzy membership profiles over the taxonomy. Our method generalizes the profiles in two steps: first, by fuzzy clustering, and then by mapping the clusters to higher ranks of the taxonomy. To be specific, we concentrate on the Computer Sciences area represented by the taxonomy of ACM Computing Classification System (ACM-CCS). We build fuzzy clusters of the taxonomy leaves according to the similarity between individual profiles by using a novel, additive spectral, fuzzy clustering method that, in contrast to other methods, involves a number of model-based stopping conditions. The clusters are not necessarily consistent with the taxonomy. This is formalized by a novel method for parsimoniously elevating them to higher ranks of the taxonomy using an original recursive algorithm for minimizing a penalty function that involves "head subjects" on the higher ranks of the taxonomy along with their "gaps" and "offshoots". An example is given illustrating the method applied to real-world data.