Reclassification as Supervised Clustering

  • Authors:
  • A. Sierra;F. Corbacho

  • Affiliations:
  • Escuela Técnica Superior de Informática, Universidad Autónoma de Madrid, 28049 Madrid, Spain;Escuela Te´cnica Superior de Informa´tica, Universidad Auto´noma de Madrid, 28049 Madrid, Spain

  • Venue:
  • Neural Computation
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In some branches of science, such as molecular biology, classes may be defined but not completely trusted. Sometimes posterior analysis proves them to be partially incorrect. Despite its relevance, this phenomenon has not received much attention within the neural computation community. We define reclassification as the task of redefining some given classes by maximum likelihood learning in a model that contains both supervised and unsupervised information. This approach leads to supervised clustering with an additional complexity penalizing term on the number of new classes. As a proof of concept, a simple reclassification algorithm is designed and applied to a data set of gene sequences. To test the performance of the algorithm, two of the original classes are merged. The algorithm is capable of unraveling the original three-class hidden structure, in contrast to the unsupervised version (K-means); moreover, it predicts the subdivision of one of the original classes into two different ones.