Heterogeneous information integration in hierarchical text classification

Authors:
Huai-Yuan Yang;Tie-Yan Liu;Li Gao;Wei-Ying Ma
Affiliations:
5F Sigma Center, Microsoft Research Asia, Beijing, P.R. China;5F Sigma Center, Microsoft Research Asia, Beijing, P.R. China;Department of Scientific & Engineering Computing School of Mathematical Sciences, Peking University, Beijing, P.R. China;5F Sigma Center, Microsoft Research Asia, Beijing, P.R. China
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 10
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem

SIAM Journal on Optimization
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchical Text Classification and Evaluation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Convex Optimization

Convex Optimization
Learning large margin classifiers locally and globally

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large margin hierarchical classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Support vector machines classification with a very large-scale taxonomy

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.