Self-Organizing Maps
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Building Hierarchical Classifiers Using Class Proximity
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Web taxonomy integration using support vector machines
Proceedings of the 13th international conference on World Wide Web
Hierarchical Dirichlet model for document classification
ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical classification of HTML documents with WebClassII
ECIR'03 Proceedings of the 25th European conference on IR research
Building Quality-Based Views of the Web
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Hi-index | 0.00 |
We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our hierarchical Dirichlet generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing regularization, and that our algorithm can be interpreted as a regularized EM algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for K-means for hierarchies. We experimentally demonstrate that both our regularized clustering algorithms achieve a higher classification accuracy over simple models like minimum distance, Naïve Bayes, EM and K-means.