Calculating similarity between texts using graph-based text representation model

  • Authors:
  • Junji Tomita;Hidekazu Nakawatase;Megumi Ishii

  • Affiliations:
  • NTT Corporation, Kanagawa, Japan;NTT Corporation, Kanagawa, Japan;NTT Corporation, Kanagawa, Japan

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Knowledge discovery from a large volumes of texts usually requires many complex analysis steps. The graph-based text representation model has been proposed to simplify the steps. The model represents texts in a formal manner, Subject Graphs, and provides text handling operations whose inputs and outputs are identical in form, i.e. a set of subject graphs, so they can be combined in any order. A subject graph uses node weight to represent the significance of each term, and link weight to represent that of each term-term association. This paper concentrates on the algorithms for making subject graphs and calculating the similarity between them. An evaluation shows that Subject Graphs can calculate the similarity between texts more precisely than term vectors, since they incorporate the significance of association between terms.