Hierarchy-Regularized Latent Semantic Indexing

Authors:
Yi Huang;Kai Yu;Matthias Schubert;Shipeng Yu;Volker Tresp;Hans-Peter Kriegel
Affiliations:
University of Munich;Siemens Corporate Technology;University of Munich;University of Munich;Siemens Corporate Technology;University of Munich
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 13
Cited 0

Kernel principal component analysis

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Robustness of regularized linear classification methods in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Large margin hierarchical classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge management. Beside textual features, the hierarchical structure of directories reflect additional and important knowledge annotated by experts. It is generally desired to incorporate this information into text mining processes. In this paper, we propose hierarchy-regularized latent semantic indexing, which encodes the hierarchy into a similarity graph of documents and then formulates an optimization problem mapping each document into a low dimensional vector space. The new feature space preserves the intrinsic structure of the original taxonomy and thus provides a meaningful basis for various learning tasks like visualization and classification. Our approach employs the information about class proximity and class specificity, and can naturally cope with multi-labeled documents. Our empirical studies show very encouraging results on two real-world data sets, the new Reuters (RCV1) benchmark and the Swissprot protein database.