Using non-extensive entropy for text classification

  • Authors:
  • Lin Fu;Yuexian Hou

  • Affiliations:
  • School of Computer Science and Technology, Tianjin University, China;School of Computer Science and Technology, Tianjin University, China

  • Venue:
  • ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes the use of non-extensive entropy for text classification. Non-extensive entropy technique is used for text classification by estimating the conditional distribution of the class variable given the document. The underlying principle of non-extensive entropy is that without external knowledge, one should prefer distributions that are uniform. This paper proposes two models for text classification based on maximum entropy principle. The first model extends Shannon entropy into non-extensive entropy to simplify the form of classifier, the other one introduces high-level constraints into non-extensive model to impose constraints on the pairs of entities. Model with high-level constraints constructs relations between word pairs which builds semantic constraints, for the sake of advancing accuracy of text classification. Experiments on the 20 newsgroup set demonstrate the advantage of non-extensive model and non-extensive model with high-level constraints.