Regularization for unsupervised classification on taxonomies

  • Authors:
  • Diego Sona;Sriharsha Veeramachaneni;Nicola Polettini;Paolo Avesani

  • Affiliations:
  • ITC-IRST, Povo – Trento, Italy;ITC-IRST, Povo – Trento, Italy;ITC-IRST, Povo – Trento, Italy;ITC-IRST, Povo – Trento, Italy

  • Venue:
  • ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
  • Year:
  • 2006
  • Building Quality-Based Views of the Web

    AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our hierarchical Dirichlet generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing regularization, and that our algorithm can be interpreted as a regularized EM algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for K-means for hierarchies. We experimentally demonstrate that both our regularized clustering algorithms achieve a higher classification accuracy over simple models like minimum distance, Naïve Bayes, EM and K-means.