Documents similarity measurement using field association terms

  • Authors:
  • El-Sayed Atlam;M. Fuketa;K. Morita;Jun-ichi Aoe

  • Affiliations:
  • Department of information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Conventional approaches to text analysis and information retrieval which measured document similarity by using considering all of the information in texts are a relatively inefficiency for processing large text collections in heterogeneous subject areas. This paper outlined a new text manipulation system FA-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. FA-Sim is based on flexible text matching procedures carried out in various contexts and various field ranks. FA-Sim measures texts similarity by using specific field association (FA) terms instead of by comparing all text information. Similarity between texts is faster and higher by using FA-Sim than other two analysis methods. Therefore, Recall and Precision significantly improved by 39% and 37% over these two traditional methods.