Calculating similarity between texts using graph-based text representation model

Authors:
Junji Tomita;Hidekazu Nakawatase;Megumi Ishii
Affiliations:
NTT Corporation, Kanagawa, Japan;NTT Corporation, Kanagawa, Japan;NTT Corporation, Kanagawa, Japan
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 3
Cited 1

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Searching structured documents with the enhanced retrieval functionality of free WAIS-sf and SFgate

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Graph-based text database for knowledge discovery

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters

Measuring text similarity with dynamic time warping

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge discovery from a large volumes of texts usually requires many complex analysis steps. The graph-based text representation model has been proposed to simplify the steps. The model represents texts in a formal manner, Subject Graphs, and provides text handling operations whose inputs and outputs are identical in form, i.e. a set of subject graphs, so they can be combined in any order. A subject graph uses node weight to represent the significance of each term, and link weight to represent that of each term-term association. This paper concentrates on the algorithms for making subject graphs and calculating the similarity between them. An evaluation shows that Subject Graphs can calculate the similarity between texts more precisely than term vectors, since they incorporate the significance of association between terms.