Using non-extensive entropy for text classification

Authors:
Lin Fu;Yuexian Hou
Affiliations:
School of Computer Science and Technology, Tianjin University, China;School of Computer Science and Technology, Tianjin University, China
Venue:
ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
Year:
2009

Citing 4
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Smoothing methods in maximum entropy language modeling

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes the use of non-extensive entropy for text classification. Non-extensive entropy technique is used for text classification by estimating the conditional distribution of the class variable given the document. The underlying principle of non-extensive entropy is that without external knowledge, one should prefer distributions that are uniform. This paper proposes two models for text classification based on maximum entropy principle. The first model extends Shannon entropy into non-extensive entropy to simplify the form of classifier, the other one introduces high-level constraints into non-extensive model to impose constraints on the pairs of entities. Model with high-level constraints constructs relations between word pairs which builds semantic constraints, for the sake of advancing accuracy of text classification. Experiments on the 20 newsgroup set demonstrate the advantage of non-extensive model and non-extensive model with high-level constraints.