Random-Walk Term Weighting for Improved Text Classification

  • Authors:
  • Samer Hassan;Rada Mihalcea;Carmen Banea

  • Affiliations:
  • University of North Texas, USA;University of North Texas, USA;University of North Texas, USA

  • Venue:
  • ICSC '07 Proceedings of the International Conference on Semantic Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new approach for estimating term weights in a document, and shows how the new weighting scheme can be used to improve the accuracy of a text classifier. The method uses term co-occurrence as a measure of dependency between word features. A random-walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. Experiments performed on three standard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach of feature weighting.