Kernel-Based Text Classification on Statistical Manifold

Authors:
Shibin Zhou;Shidong Feng;Yushu Liu
Affiliations:
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China 100081;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China 100081;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China 100081
Venue:
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Year:
2008

Citing 9
Cited 0

Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Probability Product Kernels

The Journal of Machine Learning Research
Diffusion Kernels on Statistical Manifolds

The Journal of Machine Learning Research
Text classification with kernels on the multinomial manifold

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling word burstiness using the Dirichlet distribution

ICML '05 Proceedings of the 22nd international conference on Machine learning
Metric Learning for Text Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the text literature, a variety of useful kernel methods have been developed by many researchers. However, embedding text data into Euclidean space is the key characteristic of common kernels-based text categorization. In this paper, we focus on representation text vectors as points on Riemann manifold and use kernels to integrate discriminative and generative model. And then, we present diffuse kernel based on Dirichlet Compound Multinomial manifold (DCM manifold) which is a space about Dirichlet Compound Multinomial model combining inverse document frequency and information gain. More specifically, as demonstrated by our experimental results on various real-world text datasets, we show that the kernel based on this DCM manifold is more desirable than Euclidean space for text categorization. And our kernel method provides much better computational accuracy than some current state-of-the-art methods.