Text similarity computing based on standard deviation

  • Authors:
  • Tao Liu;Jun Guo

  • Affiliations:
  • School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, China;School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, China

  • Venue:
  • ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic text categorization is defined as the task to assign free text documents to one or more predefined categories based on their content. Classical method for computing text similarity is to calculate the cosine value of angle between vectors. In order to improve the categorization performance, this paper puts forward a new algorithm to compute the text similarity based on standard deviation. Experiments on Chinese text documents show the validity and the feasibility of the standard deviation-based algorithm.