The Similarity Computing of Documents Based on VSM

  • Authors:
  • Qinglin Guo

  • Affiliations:
  • Department of Computer Science and Technology, North China Electric Power University, Beijing, China 102206 and Department of Computer Science and Technology, Peking University, Beijing, China 100 ...

  • Venue:
  • NBiS '08 Proceedings of the 2nd international conference on Network-Based Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF's time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.