Regularization for unsupervised classification on taxonomies

Authors:
Diego Sona;Sriharsha Veeramachaneni;Nicola Polettini;Paolo Avesani
Affiliations:
ITC-IRST, Povo – Trento, Italy;ITC-IRST, Povo – Trento, Italy;ITC-IRST, Povo – Trento, Italy;ITC-IRST, Povo – Trento, Italy
Venue:
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Year:
2006

Citing 6
Cited 1

Self-Organizing Maps

Self-Organizing Maps
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Building Hierarchical Classifiers Using Class Proximity

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Web taxonomy integration using support vector machines

Proceedings of the 13th international conference on World Wide Web
Hierarchical Dirichlet model for document classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical classification of HTML documents with WebClassII

ECIR'03 Proceedings of the 25th European conference on IR research

Building Quality-Based Views of the Web

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our hierarchical Dirichlet generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing regularization, and that our algorithm can be interpreted as a regularized EM algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for K-means for hierarchies. We experimentally demonstrate that both our regularized clustering algorithms achieve a higher classification accuracy over simple models like minimum distance, Naïve Bayes, EM and K-means.