The Similarity Computing of Documents Based on VSM

Authors:
Qinglin Guo
Affiliations:
Department of Computer Science and Technology, North China Electric Power University, Beijing, China 102206 and Department of Computer Science and Technology, Peking University, Beijing, China 100 ...
Venue:
NBiS '08 Proceedings of the 2nd international conference on Network-Based Information Systems
Year:
2008

Citing 3
Cited 0

Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF's time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.