Text similarity computing based on standard deviation

Authors:
Tao Liu;Jun Guo
Affiliations:
School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, China;School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, China
Venue:
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Year:
2005

Citing 16
Cited 2

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Trading MIPS and memory for knowledge engineering

Communications of the ACM
Automatic indexing based on Bayesian inference networks

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
A vector space model for automatic indexing

Communications of the ACM
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Induction of Decision Trees

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
A Linear Least Squares Fit mapping method for information retrieval from natural language texts

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
An integrated semantic-based approach in concept based video retrieval

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic text categorization is defined as the task to assign free text documents to one or more predefined categories based on their content. Classical method for computing text similarity is to calculate the cosine value of angle between vectors. In order to improve the categorization performance, this paper puts forward a new algorithm to compute the text similarity based on standard deviation. Experiments on Chinese text documents show the validity and the feasibility of the standard deviation-based algorithm.